TOTECS Forums

How to limit how often Google's crawlers visit and crawl a website

Author

Thread

Author Rowan Drew
2nd October 2024

A question was asked, for a website hosted on the TOTECS Ecommerce Platform, is there the ability to limit how often Google's crawler visits the website?

This may be needed if there's a large spike in traffic coming from Google, either because more web pages exist on the website, or Google's crawlers have become more interested in the website.

TOTECS Software Development Manager

Comments

Author Rowan Drew

TOTECS Software Development Manager

Google's crawlers periodically visit a website to find content that it can use within its own products and services, most notably for its Search Engine, but more recently it may be doing this for its own AI tools.

The crawlers find the public web pages that exist in a website, then copy each web page's content, and extract out the important data in the web page, so that it can be ranked and indexed within it's own database. This data can then be used to display links back to the website from within Google's search engine, when users in Google Search (and other Google services) search on key terms that match the web pages the crawlers found.

If a website has a lot of web pages, then Google's crawlers need to make more requests to query each web page and follow each of the links that appear each web page. This in turn drives up traffic that the crawler makes. Not only that but the crawler needs to periodically go back to the website and see if any content has changed in the existing web pages, find any new web pages, and check if any web pages have been deleted.

Because of this, any websites hosted on the TOTECS Ecommerce Platform may see increased traffic coming from Google's crawlers, especially if more products, categories or news/blogs are continually added or have data changed (such as if product pricing or stock availability continually changes).

If a website doesn't want Google's crawlers to visit so often there are few things that can be done to reduce this:

1. Setup a robots.txt file on the website and put in rules for web pages that the crawler should not visit. This may reduce the number of web pages the crawler finds, but it also means that those excluded web pages won't appear in Google Search.

2. Contact Google and file a special request to not crawl so often. See https://developers.google.com/search/docs/crawling-indexing/reduce-crawl-rate

3. Put in server side rules daily limits on how often Google's crawlers are allowed to access the website. After the limit is exceeded the crawler will receive 429.requests telling it has made too many requests Note that Google may penalise or rank down the website overall if it doesn't like the number of failed requests it receives. See https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget#emergencies and also See https://developers.google.com/search/docs/crawling-indexing/reduce-crawl-rate

4. Block Google's crawler altogether. If Google's search engine is not relevant to a business's website overall, then we can put in place blocks for Google's crawlers, and any other known search engines or crawlers that are deemed to have no value to a business. This needs to be carefully considered, since each search engine may bring in any number of new or existing visitors. At the same time the search engines may be using the website data for their own purposes, such as using or selling that data without bringing value back to the original owner of the data.

Comment has been marked as the answer

Help Centre Forum

TOTECS Forums

Supporting your growth every step of the way.

Learn

Integration

Help

Partners

Help Centre Forum

Login

Forgot Password

TOTECS Forums

Supporting your growth every step of the way.