Canonical URL Issues: Avoid Confusing Google

There’s an interesting tale of caution here which shows just how lost you can get if you start dabbling with clever SEO features and URL tweaks but take a wrong turn somewhere. Originally spotted by Malcolm Coles, we thought it interesting enough to share with you here too in more detail.

In short (and from his blog):

You use the rel=canonical command to tell Google that a given URL is actually a version of another URL – and that the search engine should treat the second version as if it was that main URL.

The Express site’s CMS is creating a duplicate version of every single page via the rel=canonical tag. And then a 3rd version, and then a 4th … and its never stopping until it gets to infinity.

Give it a read and then let’s take a closer look at the context.

It makes it all the worse that the canonical “infinite” loop issue was something Matt Cutts predicted would happen years ago. 301 and 302 HTTP errors have the same threat but we must admit it has been a while since we saw such an extreme example of this! It just goes to show that bad development and implementation of canonical elements can cause more problems than they solve.

The key to avoiding these infinite loop problems (again, another point mentioned by Google) is to make sure that the CMS is built properly in the first place. If URL masking is set up correctly in conjunction with 301s then the use of canonical tags are unnecessary.

A better option

So, to give this post a bit more of a constructive element, here are a few pointers to bear in mind for anyone toying with these functions.

Firstly, it’s worth remembering the option of using “noindex” and “nofollow” on duplicate pages should make sure they aren’t indexed in the first place and avoid the need to use canonicalisation altogether.

This can be done globally via a robots.txt file – using simple rules to automatically block or “noindex” any spiders attempting to index pages outside of the original page.

For example

(using www.dotsearch.co.uk/overview/ as the primary page)

Adding a string referencing www.dotsearch.co.uk/overview/* to the robots.txt file could let you block any other variations of this page (assuming the variations occurred after the last “/”)

In this case, www.dotsearch.co.uk/overview/index.aspx would NOT be indexed by Google, and so you could eliminate any problems canonical tag of this particular page relating it back to www.dotsearch.co.uk/overview/

Google strongly recommends that the use of canonical tags be the last resort – only use it if you have absolutely no other option. Also, be aware that Google only takes these canonical tags as a “guide” and does not guarantee that it will parse them as you assume.

Again, this comes down to building the site properly in the first place and making sure you can always avoid canonical tags (which are really seen as a “repair” tag) as instead of the standard HTTP 301 redirect.

In short, if it ain’t broke, don’t use canonical tags.