Many sites use bot blockers like Distil Networks to stop scrapers from stealing their data. The challenge for SEOs is that sometimes these bot blockers are not set correctly and can prevent good bots like Googlebot and BingBot from getting to the content, which can cause serious SEO issues. Distil is pretty adamant that their service is SEO-safe, but I am not so certain about others. We recently saw a case where the http version of a big site’s homepage was sending Googlebot to a 404 URL while sending users to the https homepage, all because the bot blocker (not Distil) was not tuned correctly. Was kind of funny searching for their brand and sometimes seeing “Oops” as the first result. It wasn’t that funny for the client.
So when we see odd crawling behavior or other odd signals, we immediately start wondering if it could be the bot blocker.
This week we saw URLs like /distil_r_blocked.html?… start to show up in a client’s GSC crawl errors report under the Other tab. I immediately went into holy bot blocker Batman! mode. You definitely don’t want Googlebot hitting bot blocker URLs.
But after doing a bit of forensic research, I realized that this was actually a case of the bot blocker working properly. Viewing the error details info in GSC showed that the source of the Distil URL was coming from another domain:
It looks as though these guys tried to scrape the site, hit Distil URLs and then published them as links. So when Googlebot crawled them (why Googlebot would want to crawl these crap sites is another issue) it would hit the Distil URLs.
I am still highly suspicious of bot blockers when it comes to SEO, because of their under-the-hood nature, but next time you see Distil URLs show up in GSC, don’t panic. It may just be those crap links your site is so good at attracting.
3 Response Comments
I’m not sure if this is a dumb question, but is there a reason you would want to use a bot blocker like Distil instead of just blocking bots via htaccess? i suppose maybe what they do is more advanced than simply blocking scrapers, but I don’t really know.
I think it’s the difference between a one-time solution and a SAAS service that’s constantly updating and evolving
I’d agree insomuch as blocking crawlers is a bad idea. It will, generally, cause issues with the important ones (Google, Bing, Yandex, Baidu) and the bad bots usually don’t respect directives. It’s a bad idea all around.