How to limit how often Google's crawlers visit and crawl a website
Author
Thread
AuthorRowan Drew 2nd October 2024
A question was asked, for a website hosted on the TOTECS Ecommerce Platform, is there the ability to limit how often Google's crawler visits the website?
This may be needed if there's a large spike in traffic coming from Google, either because more web pages exist on the website, or Google's crawlers have become more interested in the website.
TOTECS Software Development Manager
Comments
AuthorRowan Drew
TOTECS Software Development Manager
Google's crawlers periodically visit a website to find content that it can use within its own products and services, most notably for its Search Engine, but more recently it may be doing this for its own AI tools.
The crawlers find the public web pages that exist in a website, then copy each web page's content, and extract out the important data in the web page, so that it can be ranked and indexed within it's own database. This data can then be used to display links back to the website from within Google's search engine, when users in Google Search (and other Google services) search on key terms that match the web pages the crawlers found.
If a website has a lot of web pages, then Google's crawlers need to make more requests to query each web page and follow each of the links that appear each web page. This in turn drives up traffic that the crawler makes. Not only that but the crawler needs to periodically go back to the website and see if any content has changed in the existing web pages, find any new web pages, and check if any web pages have been deleted.
Because of this, any websites hosted on the TOTECS Ecommerce Platform may see increased traffic coming from Google's crawlers, especially if more products, categories or news/blogs are continually added or have data changed (such as if product pricing or stock availability continually changes).
If a website doesn't want Google's crawlers to visit so often there are few things that can be done to reduce this:
1. Setup a robots.txt file on the website and put in rules for web pages that the crawler should not visit. This may reduce the number of web pages the crawler finds, but it also means that those excluded web pages won't appear in Google Search.
4. Block Google's crawler altogether. If Google's search engine is not relevant to a business's website overall, then we can put in place blocks for Google's crawlers, and any other known search engines or crawlers that are deemed to have no value to a business. This needs to be carefully considered, since each search engine may bring in any number of new or existing visitors. At the same time the search engines may be using the website data for their own purposes, such as using or selling that data without bringing value back to the original owner of the data.
2nd October 2024
This may be needed if there's a large spike in traffic coming from Google, either because more web pages exist on the website, or Google's crawlers have become more interested in the website.