Are Spiders and Robots Eating up All Your Bandwidth?
Are You Experiencing Bandwidth Problems? Search Engine Spiders May Be the Culprit!
During a client launch, we noticed the client’s hosting account showing a dramatic spike in server load (also known as Bandwidth or CPU seconds). This sudden server load was causing site slowdowns and errors and even the suspension of the website at one point by the hosting company.
Now, a spike in server traffic may not be a bad thing … if it were accompanied by a spike in page views (visits to the website). That would mean that your website went viral, or you launched a new product, and lots of people were paying attention. But in this case, there was not a corresponding spike in page views (other that the traffic related to the launch).
After a little detective work, we found the culprit and went about solving the problem.
We will share what we did … but her is a little backstory first.
Did you know that major search engines like Google and Bing have automated processes – often called “spiders”, “crawlers”, or “bots” – to help them take “inventory” of the internet.
These spiders “crawl” websites constantly, thousands of times a day, to ensure that the search engine always has up-to-date content to provide in their search results. That’s incredibly valuable if you’re CNN or Fox News and need to have breaking news updates appearing in search results pretty much instantly. But if you’re a typical small or medium business owner who is only updating your site once a day or so, it is a little overkill.
There are a couple of ways to get around this issue. Depending on your goals and the nature of your website, you may need to use one or more of these options to help solve your bandwidth problems.
Manually adjust Google’s crawl rate
Google checks all the content on your website with the adorably-named “Googlebot”. But while they have a lot of algorithms that are designed to determine the optimal rate to crawl your site without overwhelming your server, they don’t always get it perfectly right. If you’re finding that you’re getting tons of server traffic from Google crawling your site, it’s time to set a limit. Google’s Webmaster Tools allow you to set a maximum crawl rate for your website. Note: these limits are only good for 90 days, so if you notice the problem cropping up again in a couple of months, you’ll have to repeat this step.
Configure your robots.txt file
A robots.txt file is an instruction file that spiders and bots can check before crawling the rest of your website. You can configure your robots.txt to disallow all bots from checking anything on your site at all; to only exclude them from parts of the server; to allow all bots complete access; and to include or disallow specific robots.
Check your visitor IP addresses
Finally, you may be getting lots and lots of bot visits from scammers and spammers overseas. Even if you have a well-configured plug-in suite that takes care of spam comments, the amount of visits can seriously impact your server load. If you see a whole lot of IP addresses from places that don’t make sense – for example, non-English speaking countries sending a ton of traffic to your English-only website – you may want to try blocking specific IPs or IP ranges.
Feeling a bit overwhelmed by all the bot talk (c’mon that was funny)? We can help – that is our super-power! Cutting through all the jargon to explain to you in plain & easy-to-understand language the who, what, where, when, why, and how,
If your web-tech is running amok or if you want to add something new and exciting to your offerings – you can reach out to is via our contact form and we will get in touch within 24 hours to see how we can help.