facebooktwitteryoutube
HOME SEO SERVICES BACKLINK SERVICES BEST SEO TIPS DIGITAL MARKETING FOR WEBMASTERS CONTACT
in Onpage SEO - 26 Apr, 2018
by proseo - no comments
What are crawling errors and how to fix them

Over time, as new websites are launched, URLs are changed and pages are removed or unpublished, broken links begin to litter the search results, and significantly impact the rankings of websites. As an SEO, one of the first things you learn is that broken links are one of the biggest no-no’s in our digital world. Often times they are a first priority when beginning to optimize a website. A good place to reference your broken links in search is through Google Search Console’s (formerly Webmaster Tools) crawl errors report.

Every time I  head into Google’s Search Console to review the crawl errors report, I’m hopeful that the number of errors is low. If a website is properly managed, one can be fortunate enough to have little to no 404s, 500s and other errors. In certain cases when a website isn’t properly managed, there can be thousands. In order to start correcting the errors associated with your domain, you first have to get there.

crawling errors

Typically these kinds of issues are caused by one or more of the following reasons:

  1. Robots.txt – This text file which sits in the root of your website’s folder communicates a certain number of guidelines to search engine crawlers. For instance, if your robots.txt file has this line in it; User-agent: * Disallow: / it’s basically telling every crawler on the web to take a hike and not index ANY of your site’s content.
  2. .htaccess – This is an invisible file which also resides in your WWW or public_html folder. You can toggle visibility in most modern text editors and FTP clients. A badly configured htaccess can do nasty stuff like infinite loops, which will never let your site load.
  3. Meta tags – Make sure that the page(s) that’s not getting indexed doesn’t have these meta tags in the source code: <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
  4. Sitemaps – Your sitemap isn’t updating for some reason, and you keep feeding the old/broken one in Webmaster Tools. Always check, after you have addressed the issues that were pointed out to you in the webmaster tools dashboard, that you’ve run a fresh sitemap and re-submit that.
  5. URL parameters – Within the Webmaster Tools there’s a section where you can set URL parameters which tells Google what dynamic links you do not want to get indexed. However, this comes with a warning from Google: “Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don’t recommend you use this tool unless necessary.”
  6. You don’t have enough Pagerank – Matt Cutts revealed in an interview with Eric Enge that the number of pages Google crawls is roughly proportional to your Pagerank.
  7. Connectivity or DNS issues – It might happen that for whatever reason Google’s spiders cannot reach your server when they try and crawl. Perhaps your host is doing maintenance on their network, or you’ve just moved your site to a new home, in which case the DNS delegation can stuff up the crawlers access.
  8. Inherited issues – You might have registered a domain which had a life before you. I’ve had a client who got a new domain (or so they thought) and did everything by the book. Wrote good content, nailed the on-page stuff, had a few nice incoming links, but Google refused to index them, even though it accepted their sitemap. After some investigating, it turned out that the domain was used several years before that, and part of a big linkspam farm. We had to file a reconsideration request with Google.

Some other obvious reasons that your site or pages might not get indexed is because they consist of scraped content, are involved with shady link farm tactics, or simply add 0 value to the web in Google’s opinion (think thin affiliate landing pages for example).

Server Errors

What is a Server error?

A server error is that which occurs when the response time of your server is extremely long, leading to a timeout of the request.

Are you aware that the Googlebot that crawls your website can only wait for a limited period of time when there’s need to load your site.

If your server’s response time is very long, Googlebot will simply give up.

There is a difference between DNS errors and server errors. In the first case, Googlebot experiences difficulty in finding your URL, courtesy of DNS issues.

In the second case, Googlebot server issues prevent Googlebot from loading the web page, even though it can connect to the website.

What is the importance of Server errors?

You must take action immediately as you discover that your site is experiencing server errors. This is because they can have disastrous effects on your site.

The errors make it impossible for Googlebot to crawl. As a result, it consistently gives up after a certain period of time.

How to fix server errors

Google recommends use of Fetch as Google to find out if Googlebot is capable of crawling your site.

If the tool indicates no problems with your homepage, it will be safe for you to assume that the search engine is capable of accessing your website.

There are various types of server error issues. They include; timeout, no response, truncated headers, connect timeout, connection reset, connect failed, truncated response and connection refused, among other issues.

Find out the specific type of issue that affects your website before attempting to resolve it.

Robots failure

This error occurs when Google fails to retrieve your site’s robots.txt file.

You’ll be surprised to know that the file is only required when you want the search engine to skip particular pages when crawling your site.

You only need the file if your website comprises of web content that ought not to be indexed by Google or other search engines.

What is the importance of Robot’s failure?

Urgent action need not be taken if your site is still small and very static. Additionally, you need not worry if you haven’t added any new pages or if no significant changes have been made in the recent past.

Conversely, if your website publishes new content nearly every day, the issue must be fixed within the shortest time possible.

This is because if Goglebot fails to load the aforementioned file, it won’t crawl your site. As a result, any changes made or pages added won’t be indexed.

How to fix robots failure

To fix the problem, you must first ascertain proper configuration of the robots.txt file.

Confirm the pages that you want Googlebot to skip while crawling, since it will crawl all the other pages by default.

Additionally, you must check for existence of the Disallow clause. Make sure it doesn’t exist at all.

If your file exhibits no problems but you still experience problems, use a tool known as server-header checker to find out if the file is displaying a 404 or 200 error.

crawling errors 1

Site Errors

Basically, they’re high level errors which can entirely affect the site. Therefore, you ought to pay attention to them.

If you visit Crawl Errors dashboard, you’ll see the errors that have been affecting your website for the last three months.

site errors

Ideally, Google should give you a Nice score. It’s rare for the search engine to validate your site. Therefore, if you get a Nice score, just know that you’re doing a great job.

site error

How frequently should you check?

Ideally, you should check for the presence of errors every 24 hours. There will be no major problems most of the time, but you shouldn’t get tired. Just imagine the extent of damage that will occur if you fail to check on a regular basis.

If it’s impossible or difficult to check for site errors on a daily basis, consider examining your website every 90 days to find out if any problems may have occurred during the 3-month period, and fix the errors accordingly.

Site errors can be broadly classified into three:

  1. DNS errors
  2. Server Errors
  3. Robots failure

URL Errors

There’s a big difference between site errors and URL errors. Whereas the former can affect the whole website, the latter can only affect certain pages.

URL Errors

Many owners of websites have experienced issues brought about by URL errors, and the issues sometimes make them worried.

However, if you own a site or sites, you don’t need to freak out. Just remember that Google usually ranks errors depending on the damage they can cause.

Additionally, the errors may automatically get resolved

If you have made substantial changes to your website in the recent past with the objective of fixing errors, or you strongly believe that most of the errors stopped occurring, why not mark all the resolved errors, and keep tabs on them on a regular basis?

URL errors

The benefit of taking this action is, the errors will stop appearing on the dashboard, although the search engine will keep revealing them whenever Googlebot crawls your site.

If the URL errors were properly fixed, they will not appear again. In case they appear, you must fix them within the shortest possible time.

Soft 404

This error occurs when a web page is indicated as found, instead of not found.

error 404

Are you aware that a 404 page may look like a genuine not found page, even when that isn’t the case?

The user-visible element of a typical 404 page is the page’s content. The displayed message should inform users about disappearance of the page.

HTTP requests

Website owners usually help site visitors by availing interesting 404 responses or list of web links that they can visit.

Error 404

What is their importance?

In case the pages listed as 404 errors are not very important, there’s no need for you to urgently fix them.

However, if the pages listed as 404 errors are very important because they contain product categories, address of your business or information concerning payment for goods or services, then it will be necessary to fix the problem as soon as you can.

How to fix Soft 404 errors

If the pages do not exist any more:

  1. Make sure the server-header-response is either 410 or 404. It shouldn’t be 200.
  2. Redirect all the old web pages to relevant pages within the site.
  3. Avoid redirecting large volumes of dead pages to the website’s homepage. Instead, direct them to similar pages that you’ll consider appropriate.
  4. 404 Errors

This type of error indicates that Googlebot attempted to crawl on a non-existent page. It indicates the error when the non-existent page is linked to existing web pages or sites.

URL Errors section

Google clearly indicates that 404 errors have no effect on rankings. For that reason, you can ignore the errors.

What importance do they have?

If pages that contain very useful information display 404 errors, immediate action must be taken. However, if the web page disappeared but it contained no useful information, then you don’t need to worry.

Fixing 404 errors

The following are steps that you should take if your web pages display 404 errors:

  1. Make sure publishing of the web page is done from your CMS (content management system). It shouldn’t be deleted or published in draft mode.
  2. Make sure the error URL isn’t a variation, but a correct page.
  3. Find out if the error appears on your website’s http versus non-http and www versus non-www versions.
  4. Ensure you 301- redirect the affected page to a related page that you consider most appropriate.
  5. Access denied errors

This message is an indication that Googlebot is unable to crawl the web page. The errors prevent it from crawling in the following ways:

  1. When you ask users to first log in before seeing a URL or,
  2. Robots.txt file prevents Googlebot from crawling specific sites or,
  3. Your host blocks Googlebot, or severs asks users to confirm by proxy.

Role of Access Denied errors

If you’d like Googlebot to crawl the blocked pages, then you’ll need to fix the errors within the shortest time.

However, if you don’t want it to crawl the blocked pages, just ignore the error messages.

Fixing access denied errors

  1. Get rid of the login requirement from pages that you’d like Googlebot to crawl on.
  2. Find out if pages listed on robots.txt file are supposed to be blocked from both indexing and crawling.
  3. Use a tool known as robots.txt tester to find warnings and conduct tests on individual URLs.
  4. Use Fetch as Googleto test your site’s appearance to Googlebot.
  5. Make use of Screaming Frog to scan your website.
  6. Not Followed Errors

These errors occur when Google encounters problems with Flash, redirects and Javascript, among others.

You should be worried about these errors when they occur on high-priority URLs.

If the problems arise from URLs that have become inactive, or stem from non-indexed parameters, then there’s no reason for you to worry.

To fix the errors, make use of Fetch as Google or Lyn Text Browser to examine the site. If you fail to see important content or the pages load, you would have discovered where the problem lies.

For not followed problems concerning redirects, follow the steps outlined below:

  1. Find out if there are redirect chains.
  2. Update the website’s architecture.
  3. Get rid of redirected URLs in the site-map.

There are other tools that you can also use to fix these errors. They are Screaming Frog Search Engine Optimization Spider, Raven-Tools Site-Auditor and Moz Professional Site Crawl.

Server Errors and Domain Name System Errors

Google still classifies DNS and server errors under the URL errors umbrella. This is primarily because the giant search engine wants you to detect and fix the errors the same way you’d take care of the errors mentioned above.

DNS Errors

What are DNS errors?

DNS is an acronym for Domain Name System. They are the most common errors and are usually the first to affect websites.

If Googlebot experiences issues relating to DNS, your site will be unable to connect with Googlebot through a DNS lookup or timeout issue.

What is their importance?

DNS issues play a very important role since it’s the first step that must be taken when you need to access your site.

You ought to take drastic and firm action whenever you start experiencing DNS issues which prevent connection between your site and the giant search engine.

How can you fix DNS issues?

Google recommends use of a tool known as Fetch as Google to find out how Googlebot crawls on your web pages.

If you only need a DNS connection status, you can Fetch as Google without necessarily rendering, especially if you want quick results.

Fetch as Google tool

Check with your service provider of Domain Name System.

If the search engine is incapable of properly fetching and rendering your page, then you’ll need to take more action. Ask your DNS service provider to help find out where the problem is.

Make sure the error code displayed on your server is either 500 or 404 error. Your server ought to show a Server Error or Not Found Error (500 and 404 respectively).

With regard to accuracy, the two aforementioned codes are better than experiencing a DNS error.

Other tools that you can use to check for and fix DNS issues are Web Sniffer. net and ISUP.me as well.

____________________________________________________________________________________________

We provide the best quality backlinks as ever, pls contact us qualitybacklink.net@gmail.com ; Skype: qualitybacklink