

Tutorials



Last updated on:

December 10, 2024

How to create a robots.txt file in Webflow

Author

BRIX Templates

How to create a robots.txt file in Webflow



Article changelog

Dec 10, 2024 - Initial version of the article published



Table of contents

Search engines rely on automated programs called 'crawlers' to discover and understand your website's content. These crawlers look for a special file called robots.txt that provides them with instructions about which parts of your site they can access and index.

However, if you're trying to create or modify this file in Webflow, you've probably noticed that there's no direct way to do it - Webflow doesn't provide built-in functionality for hosting custom files like robots.txt.

Google no longer relies on robots.txt for indexing control

In a significant change announced on July 1st, 2019, Google stated they would no longer support the usage of robots.txt for controlling page indexing. Instead, they recommend using the meta noindex tag directly in your pages when you need to prevent specific content from appearing in search results.

While robots.txt files can still be used for other important purposes, such as managing crawler access to certain sections of your site or specifying your sitemap location, implementing them requires some technical workarounds in Webflow.

If you need these functionalities for proper crawler management or want to ensure search engines interact correctly with your site's resources, we'll explore how to set this up on Webflowusing Cloudflare Workers.

Creating a robots.txt file in Webflow using Cloudflare Workers

Since Webflow doesn't allow direct file uploads, we'll need to use a reverse proxy to serve a custom robots.txt file.

How reverse proxy works on Webflow and how its relevant to robots.txt

Think of a reverse proxy as a traffic director that can modify how specific URLs on your site are handled. In this case, we'll use Cloudflare Workers to create a system where most requests go to your Webflow site, but requests for robots.txt will be handled differently.

Set up Cloudflare for your domain

Create a Cloudflare account:

Go to cloudflare.com
Sign up for a new account if you don't have one
Click "Add a Site" in your dashboard
Enter your website's domain name
Select the Free plan (unless you need additional features)

Add domain to Cloudflare to setup Webflow robots txt file

Configure your DNS settings:

After entering your domain, Cloudflare will scan your existing DNS records
Review each record to ensure it matches your current setup
Add any missing records
Click "Continue"

Update your nameservers:

Cloudflare will provide you with two nameserver addresses
Log into your domain registrar's website (like GoDaddy or Namecheap)
Find the nameserver settings
Replace the current nameservers with Cloudflare's nameservers
Save your changes
Wait for the changes to take effect globally (Usually it takes less than an hour, however 24-48 hours are typically recommended)

Configure nameservers for Webflow site in Cloudflare

Create and configure the Worker

Access Workers & Pages in Cloudflare:

Log into your Cloudflare dashboard
Click on "Workers & Pages" in the left sidebar
Select "Create a Worker"
Choose "Create a Service"
Give your service a name (e.g., "robots-txt-handler")

Create worker JS on Cloudflare for Webflow robots.txt file creation

Understanding the Worker setup

The Worker we're creating will perform three important tasks:

Serve a custom robots.txt file when requested
Forward all other requests to your Webflow site
Ensure proper canonical URLs are maintained for SEO

Set up the Worker code:

Click "Deploy" to create a new Worker
Select "Edit code"
Replace the default code with our robots.txt handler

/**
 * Webflow Custom robots.txt Handler with Canonical URL Support
 * Serves a custom robots.txt file while proxying all other requests to Webflow
 * @author BRIX Templates
 * @version 1.0.0
 */

// Replace these with your site URLs
const WEBFLOW_SITE = 'https://your-site.webflow.io';
const CANONICAL_SITE = 'https://www.yourdomain.com'; // Include www if desired

// Your robots.txt content
const ROBOTS_TXT_CONTENT = `
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /account/
Disallow: /checkout/
Disallow: /cart/
Disallow: /order-confirmation/
Sitemap: ${CANONICAL_SITE}/sitemap.xml
`;

addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
    const url = new URL(request.url);

    // Serve custom robots.txt
    if (url.pathname === '/robots.txt') {
        return new Response(ROBOTS_TXT_CONTENT, {
            headers: {
                'content-type': 'text/plain',
                'cache-control': 'public, max-age=3600'
            }
        });
    }

    // Forward request to Webflow
    const modifiedRequest = new Request(
        WEBFLOW_SITE + url.pathname + url.search,
        request
    );

    // Get the response from Webflow
    let response = await fetch(modifiedRequest);

    // Clone the response so we can modify it
    response = new Response(response.body, response);

    // Update canonical URL if present
    const html = await response.text();
    const updatedHtml = html.replace(
        new RegExp(WEBFLOW_SITE, 'g'),
        CANONICAL_SITE
    );

    // Return modified response with correct canonical URLs
    return new Response(updatedHtml, {
        headers: response.headers,
        status: response.status,
        statusText: response.statusText
    });
}

Important notes about the code

Once you pasted the code, it is very important that you pay a lot of attention to the following settings, as they are critical for your worker to work properly with your domain and your Webflow site.

Canonical URL setup:

Set CANONICAL_SITE to your actual domain (e.g., 'https://www.yourdomain.com' or 'https://yourdomain.com')
Choose whether to include 'www' based on your preferred canonical URL format, ideally respecting the one you have right now to avoid pages being re-indexed under a new canonical URL
The code will automatically replaces all URLs with your canonical domain

Configuration settings:

Replace WEBFLOW_SITE with your Webflow-hosted site URL.
For this, you need to make sure to activate your Custom domain in Webflow under a different subdomain than the one you have. For example, if your site is 'yourdomain.com', you could setup 'website.yourdomain.com' or even a new domain like 'yourdomainwebsite.com'
Remember this is only for the connection, as Webflow doesn't allow reverse proxies to parse webflow.io domains, meaning we need to connect a actual custom domain for the routing, even we won't actually use it at a front-end level

Robots.txt configuration:

Finally, once you setup the CANONICAL_SITE and WEBFLOW_SITE url, you can proceed to edit your desired robots.txt just below that (starting on line 13 of the script). On this part, feel free to add any directives you need.

Before clicking Deploy and continuing to the next step, it is recommended to test multiple URLs in the Preview sidebar, just so you can verify that all website URLs properly redirect to your website, with the exception of robots.txt which redirects to a file with your added robots.txt information.

Connect the robots.txt Cloudflare Worker to your domain

The final step is to set up the Cloudflare Worker route:

Go to your website's Cloudflare dashboard
Navigate to Workers & Pages
Click "Add route"
In the "Route" field, enter: 'yourdomain.com/*'
Select your Worker from the dropdown
Click "Save"
Test your setup:
- Wait a few minutes for the changes to take effect
- Visit yourdomain.com/robots.txt in your browser
- You should see your custom robots.txt content
- Visit other pages on your site to ensure they load normally

Configure Cloudflare router settings for Webflow

Monitor and maintain your setup:

Finally, it is recommended to do regular checks every 2-4 weeks to ensure that everything is working as expected.

Ensure your canonical URLs are displaying correctly, (you can use the Detailed SEO free Chrome extension)
Check that all pages load properly through the proxy
Verify your robots.txt is accessible every few weeks

Need a hand from our Webflow team?

Setting up a custom robots.txt file through Cloudflare Workers involves several technical steps and careful configuration. If you're not comfortable with this process or need assistance ensuring it's set up correctly, our team of Webflow experts at BRIX Templates is here to help. We can assist with implementation, testing, and optimization to ensure your site's crawler directives are working as intended.