Beaglecat_keyword_cloud

SEO Crawling, Indexing & Ranking

Everybody knows how important it is to get your website indexed by Google. I was quite curious as to how much this particular search engine is actually used.

It appears that in February 2014, Google detained more than 67% of the US search query volume. Bing or Yahoo couldn’t even compete with this figure, having shares of 18.4% and 10.3% respectively.

When it comes to regular search behavior, it seems that almost 90% of users only click on the first 3 results. Which basically means that, if your website doesn’t come up on the first search page, you don’t exist.

In search engine optimization, there are three things that matter the most: SEO crawling, indexing and ranking. While in other countries, there are other search engines that rule the market (such as Baidu in China and Yahoo in Japan), in US and the rest of the world, everything revolves around Google. In a way, this turns SEO into an unfair affair: it’s not a general search engine optimization (for any engine), it’s one specifically targeted for Google. The latter holds all the winning cards, in the sense that it can update its algorithm as often as possible. Moreover, Google can choose not to disclose the rules of this algorithm (or simply keep some of them secret) and nobody can blame the company for its non-disclosure policy.

Part 1. SEO Crawling.

Imagine you end up on a site, either randomly or because you’ve somehow heard of it. The website content is all mixed up, it does not have any solid menu that you can decypher at the first glance or any categories for published articles. How would you react? I for one, would probably close the window and never look back.

Everything related to the basic organization (synonym) of a site is related to site architecture. In other words, any regular visitor should feel like they can always end up somewhere else, find their way back to the homepage, actually be capable of reading other web pages and click through any content that may be of interest for them. Human behavior sometimes has a lot in common with the behavior of a Google robot.

Beaglecat PrntScr Menu

Notice the menu from the above image? This is a classical healthy example of what almost any respectable website homepage should look like. Say one of your site visitors is lucky enough to end up on your homepage. Where do they go from there? Can they go anywhere? Or must they wander continuously, until they discover the content they’re actually looking for?

site-architecture-simple

Image credit: State of Digital

The “nofollow” tag

Once you have everything you need in proper order, you should start thinking about tags. Now if you’re in the position of owning a site that publishes regular content and links to A LOT of other websites, you may have a problem. Google functions this way: pages get ranked according to their health. If a robot realizes most of your content is composed of links to other sites, it may wrongly think you’re trying to copy some piece of text, or mostly performing silly copywriting.

In this case, you should try using the nofollow tag. We’ve recently discovered some of our site visitors came from Quicksprout. What do you know?, we said to ourselves, thinking this could be an error of some kind. But the fact was that Neil Patel had actually quoted one of our previous blog posts, but in doing so, he used a nofollow tag, so his own site doesn’t get de-indexed for nothing. If you’re the owner of a huge ecommerce website containing a large number of links to distributors’ websites, you should think again before throwing those links in your posts containing product descriptions or reviews. There’s a crawl budget.

Another useful example in this case is the widely known Wikipedia, which ONLY contains nofollow links. The reason is the same: it wouldn’t do Wikipedia any good if an anonymous user changed one of the links from the resources category and recommended an untrustworthy site. Nofollow links intervene in the process of establishing pagerank, so be sure to endorse only sources you regard as credible.

Why would publishers need to block such votes? Doing so can help them avoid problems with search engines that believe they are selling influence or are somehow involved in schemes deemed as unacceptable SEO practices. Source: searchengineland.com

There are two ways of applying the nofollow feature. Either you use the rel=”nofollow” tag on an individual link (therefore affecting the way robots see only that particular link), or you use the rel=”nofollow” META tag on a page (therefore affecting all the links from within that page).

nofollow tag

 

Tags in general are of utmost importance when it comes to how robots interpret the content of your site. Alt tags are a must, particularly if you’re keen on using images or any other types of content that’s practically impossible to decipher when you’re an automatic program, not a human being.

To wrap up the crawling portion of this article, we recommend you read some data on how often Google is actually crawling your website.

Part 2. Indexing.

How do you check if your pages are indexed as often as possible? It’s quite simple. You visit google.com and type in “cache:yoursite.com/this-page”. This way, you’ll see the latest version of that webpage, which Google has indexed. For instance, here’s a printscreen of the image we found of our blog.

seo crawling

Everything would be great if the real image of the site, in the present, didn’t in fact look like this:

seo tools for content marketers

There’s a 3 day long gap between the two blog posts. As I am writing these lines, we’re on the 20th of August, which means that Google has failed to index the blog page in the last 4 days.

Log into Google Webmaster Tools (which is probably the best SEO software you can use for free) and check on your sitemap fragmentation. You may need to upload more than one sitemap if you have a large amount of content on your website and you want it all indexed.

Checking on your index status is also possible from within Google Webmaster Tools. The number of indexed pages should grow continuously if you update your site frequently. If you start noticing robots are skipping some pages, you may have a problem.

Indexing your content is effective and simple with the help of the Fetch as Google feature, from the Google Webmaster Tools. You can add any page you want or a collection of pages, manually. If the status of the request is successful, you can submit your page to the index.

gwt-fetch-success

Image source: searchenginewatch.com

Part 3. Ranking

Google’s algorithms change so frequently that it’s puzzling for a website owner to understand how the search engine is ranking sites. The seo rank of a webpage is established by taking into account many factors, and many of them we’ve already mentioned in one of our previous posts. Ranking is also related to authorship, so it definitely would not hurt if you got yourself a Google Plus account and connected it to your website.

Ranking is all about keywords, the general health of your website, your sitemap, landing pages, internal and external linking and sometimes even activity on social media networks.

In other words, it all begins with a valid site structure. Once Google robots check into your site, they first read your robots.txt, and then they verify all your webpages, one by one. This is why it’s so important every page has title tags, meta descriptions and heading tags.

Simple changes can be performed after perfecting your site architecture and content, from your Google Webmaster Tools account; keywords can be added (naturally, of course) and results can be tracked using Google Analytics.

We’ll end this SEO article with a quote that testifies the significance of the three things we’ve chosen to present you today.

Search engine optimization can be boiled down to three core elements, or functions, in the current era of Google: crawling time (discovery), indexation time (which also includes filtering), and ranking time (algorithmic).

*Credit for the featured image: Tagxedo

 

Leave a comment

Your email address will not be published. Required fields are marked *