Google: Do Not Use Robots.txt To Block Indexing Of URLs With Parameters


Written by Barry Schwartz and first appearing on Search Engine Roundtable

Google’s John Mueller said you should absolutely not “use robots.txt to block indexing of URLs with parameters.” He said if you do that then Google “cannot canonicalize the URLs, and you lose all of the value from links to those pages.” Instead use rel-canonicals and link consistently throughout your site.

John said this on Twitter, here is an embed of the tweet:

RomainP@RomainP29619045

Hello @JohnMu , I see more and more website having pages “indexed despite being blocked by robots.txt”. Any Idea on why or how to stop that? Mainly URL with parameters.

🍌 John 🍌

@JohnMu

Don’t use robots.txt to block indexing of URLs with parameters. If you do that, we can’t canonicalize the URLs, and you lose all of the value from links to those pages. Use rel-canonical, link cleanly internally, etc.

See 🍌 John 🍌‘s other Tweets

He then had a follow up about why it is so bad to block these URLs with robots.txt:

RomainP@RomainP29619045

Thank you for your answer and time. The thing is, on e-commerce websites, filter mean a lots of parameters, so I use both canonical and robots.txt to try not to waste time of bots on tons of pages. Wrong practice?

🍌 John 🍌

@JohnMu

We wouldn’t see the rel-canonical if it’s blocked by robots.txt, so I’d pick either one or the other. If you do use robots.txt, we’ll treat them like other robotted pages (and we won’t know what’s on the page, so we might index the URL without content).

See 🍌 John 🍌‘s other Tweets

So be careful about this and double check all of these things on your web sites.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s