news ticker

Saturday, 1 February 2014

Why Your Website Is Not Cached in Google

For people who are asking “what is the reason not to show cache of a website”, here is a brief rundown of possible explanations.
Someone left the following comment a week ago. I have tried twice to send an email to the person but their mail service says the mailbox is unavailable. Normally I would leave matters at that but this question comes up from time to time so I thought I would write a brief article about it. To the original commenter, I hope you see this. Your site should be fine.
This is off-topic, but I was wondering if you could lend me some of your SEO expertise. My website is not cached in google. The homepage and all inner pages are indexed, but none of them are cached. I was wondering if this affects whether my website is sent organic search engine traffic. Also, can you think of a few possible causes off the top of your head as to why my pages are not cached. My blog has some settings checked like “no archive.” I changed this and a few other settings yesterday, so I’ll have to way to see if that’s what’s going on.
I also checked my robots.txt, and there’s nothing in it that says anything about not letting search engines cache my pages; it’s just a standard, basically blank file. If you could help, I would really appreciate it! By the way, I enjoyed reading this post.
The reason why your site is not cached in Google is the “NoArchive” directive that you found in the settings. Turning that off effectively removes the “noarchive” directive from your pages. Google and other search engines will now cache your pages.
Does the “NoArchive” directive affect rankings or search visibility? Not that I have ever seen. People sometimes use this directive to prevent search engines from copying their content for redistribution. That is, if you do something on your Web pages to prevent visitors from copying and pasting the content elsewhere, a search engine cache page can still be used to copy and paste the content. Preventing the search engine from caching makes it a little more difficult to copy and paste.
I should point out to people who want to prevent copying of their content that every time a visitor hits your page their browser copies everything to their local hard drive. The visitor can ALWAYS grab your content and republish it elsewhere. Streaming content doesn’t work that way, but there are programs available on the Web that will record streaming feeds.
Remember the old adage: If you put something on the Internet, it’s out there to stay.
Search engines may not cache a page while showing it in their SERPs for at least one other reason. For example, in the comment above, the writer checked his “robots.txt” file. If you block a URL in your “robots.txt” file but a search engine finds links pointing to that URL, the search engine may show that URL in its results. These listings are sometimes referred to as “URL-only” or “Uncrawled listings”.
Google won’t show these URLs in its safe search mode but if you loosen the search restrictions you may see them from time to time. The search engine displays the uncrawled URL to let you know it’s there but that it doesn’t know anything about the content of the page.
For this reason, when you want to remove content from a search engine’s listings they recommend that you allow them to crawl the pages but that you embed “noindex” directives in the meta tags. Robots meta directives look like this:
<meta name=”robots” content=”" > (No restrictions)
<meta name=”robots” content=”noindex” > (Do not index this page)
<meta name=”robots” content=”noarchive” > (Do not archive/cache this page)
<meta name=”robots” content=”nofollow” > (Do not follow any links on this page)
<meta name=”robots” content=”noodp” > (Do not use the Open Directory Project description in SERPs)
<meta name=”robots” content=”noydir” > (Do not use the Yahoo! Directory description in SERPs)
You can combine them, separated by commas:
<meta name=”robots” content=”noindex,noarchive,nofollow,noodp,noydir” >
You can give positive directives as well, although most SEOs feel these are redundant and unnecessary:
<meta name=”robots” content=”index,archive,follow” >
Another reason why a search engine may now show cache data for a page is that it has only just recently crawled that page and has not fully integrated the data into its index. This situation is usually temporary and I believe usually only lasts for a few days.
Finally, if you serve your page content through some special AJAX or Javascript code, the search engine may only cache the Javascript code and not actually cache the contents that the code displays.

No comments:

Post a Comment