Canonical Tag: Google Answers Your Questions

06/20/2024 11:50 by andrewsho

I have been spending a bit of time working through implementing the rel=canonical tag on a number of sites and I found myself returning to the Google Webmaster Central post on the subject trying to figure it all out. The problem is that there are too many questions in the comments section and it takes forever to find the answers and clarifications from the Googlers. So I figured I would just do some cutting and pasting and post only the answers right here. This will save us all a bit of time:

Here are the first FAQs created by the Googlers:
Now, you can simply add this <link> tag to specify your preferred version:

<link rel=”canonical” href=”http://www.example.com/product.php?item=swedish-fish” />

inside the <head> section of the duplicate content URLs:

http://www.example.com/product.php?item=swedish-fish&category=gummy-candy
http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678

and Google will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. Additional URL properties, like PageRank and related signals, are transferred as well.

This standard can be adopted by any search engine when crawling and indexing your site.

Of course you may have more questions. Joachim Kupke, an engineer from our Indexing Team, is here to provide us with the answers:

Is rel=”canonical” a hint or a directive?
It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.

Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL.

Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.

What if the rel=”canonical” returns a 404?
We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.

What if the rel=”canonical” hasn’t yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.

Can rel=”canonical” be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.

What if I have contradictory rel=”canonical” designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.

Can this link tag be used to suggest a canonical URL on a completely different domain?
No. To migrate to a completely different domain, permanent (301) redirects are more appropriate. Google currently will take canonicalization suggestions into account across subdomains (or within a domain), but not across domains. So site owners can suggest www.example.com vs. example.com vs. help.example.com, but not example.com vs. example-widgets.com.

Sounds great—can I see a live example?
Yes, wikia.com helped us as a trusted tester. For example, you’ll notice that the source code on the URL http://starwars.wikia.com/wiki/Nelvana_Limited specifies its rel=”canonical” as: http://starwars.wikia.com/wiki/Nelvana.

The two URLs are nearly identical to each other, except that Nelvana_Limited, the first URL, contains a brief message near its heading. It’s a good example of using this feature. With rel=”canonical”, properties of the two URLs are consolidated in our index and search results display wikia.com’s intended version.

And here are the questions answered in the comments section:

Maile Ohye said…

Hi guys, thanks for reading our post and making time to ask for clarification. 🙂 We’ve tackled some of your questions below…

@Everyone: Just to clarify, rel=”canonical” helps Google select one URL and its contents from duplicates — it doesn’t accumulate the content from duplicates into one URL. If you set the rel=”canonical” in URL A and URL B to point to URL C, the contents of URL C won’t become “content from A + the content from B + the content from C.”

With rel=”canonical”, we’ll likely index the content of C by itself, and then transfer to it the quality signals and linking properties from URL A and URL B.

Hoosier said…
We are making a change on our site that will move combine content from several different pages all onto one single page. We are planning on 301’ing the pages onto the new, combined page.
I’m wondering if we can just use the canonical tag on the pages that will be combined – the combined content will reproduced exactly as it was on the prior page.

@Hoosier: 301s from your old URLs to your new combined content page sounds like the preferred method for your situation. 301 redirects are still of primary importance. rel=”canonical” should only be used in areas where the content is identical (or very similar) but it’s not possible to eliminate content from being served on multiple URLs.

Yannick said…
What if there is no link to the canonical page in the site itself? In your example, say the swedish-fish is always presented in a category context, i.e. there is no link in the site directly to http://www.example.com/product.php?item=swedish-fish — however this URL does render properly. Is it still ok to use it as a canonical, even if no link point to it?

@Yannick: Yes, rel=”canonical” can still take effect, even if there are no other links to the preferred version of the URL.

Cahit said…
The same real content but a little different design, like “results in LIST type”, “results in CATALOG type”;
“view large icons”
“view list”
Content is infact the same but we show it different. Should “canonical” be used in this case?

@Cahit: Yes, if the items in your content page are the same but with different views, such as sort order or listing type, then rel=”canonical” can be used.

Olagato said…
What about multilingual sites…
http://mydomain.com/en/
http://mydomain.com/es/
http://mydomain.com/fr/
…the same structure with different language content. This is made with an authomatic redirection from “root domain”: http://mydomain.com/ to “language domain” by example http://mydomain.com/en/ (through navigator language)
…A canonical tag to: http://mydomain.com/ is needed ?

@Olagato: Each language should have a separate URL because the content is unique. We’d advise against equating different languages using either 301s or link rel=”canonical”.

AjiNIMC said…
Will it take care of https issues as well, I hope it will? Just wanted to confirm it. Also will like to know about from when can we expect it workable?

@AjiNIMC: Yes, you can use rel=”canonical” for https to http or vice versa. Rel=”canonical” is already live in Google’s indexing process and has shown results for our trusted testers. After your content containing rel=”canonical” information is crawled, the process can take effect for your site.

vizualbod.com said…
What about paging? E.g. if there are paged listings of domain objects (products, job posts, links, search results) categorized objects, tagged objects.
Can I use this to specify the canonical URL as the first page of paged listings?
Will Google assign linking-properties to objects on subsequent pages as if the domain object listing was on the first page of listings?

@vizualbod.com, George: I would not specify the canonical URL as the first page of listings for paginated content. Why not? As mentioned earlier, rel=”canonical” does not accumulate text contents from various pages — so it should only be used in situations where the content is identical or nearly identical. In a paginated series, each page contains entirely different content/items so they shouldn’t be grouped as one URL. Thanks for asking, though!

Shaper said…
I’m puzzled as to why cross-domain canonicalisation suggestions aren’t taken into account.
If a third party copies my content it would be bad for them to be able to claim their was the original, but since canonicalisation hints point from copy->original, this wouldn’t be possible.
However, if a third party decides to mirror, cache or otherwise copy my content, it would be nice if they could semantically indicate the origin of that content on my site.
Allowing cross-domain canonicalisation hints would seem to be a big win for the semantic web (and hence a big win for Google) without any drawbacks.
Can anyone explain why it’s not allowed?

@Shaper: In this first announcement of rel=”canonical”, we wanted to keep things more simple and help webmasters with duplicate content URLs. The feature you’re explaining deals slightly more with copyrighting or authorship, which wasn’t our primary intention with this release. Given feedback over time, we may reconsider this decision, but I wouldn’t count on this anytime in the near future.

Silverstall said…
The example given is for dynamic xhtml. Presumably the same tag will work for ordinary html so that it can be written without the trailing slash.

@Silverstall: properly resolving the trailing slash in your URLs is often more scalably done through your webserver, not at the page level with rel=”canonical”. Otherwise, if “without the trailing slash” means we can still parse that just fine.

Rob said…
Would it be appropriate to use this tag to set a canonical version where the page text on one version is generated with Javascript, and I’d like to point to the pure HTML canonical version?
From a user’s point of view, the pages are an identical duplicate, but not from a search engine spider’s (non JS) position.
Does this satisfy the “We allow slight differences” part?

This can be a bit of slippery slope, but yes, if you’re pointing to the HTML version of your site versus say, an AJAX version, that’s okay.

WhatsTheBigIdea said…
Thanks for the info! Is there an easy way to see if there are duplicates which Google sees on my websites…

@WhatsTheBigIdea: There are ways to understand your potential duplicate issues more easily — not sure if I’d say there was an “easy way” 🙂

Background on duplicate content: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66359

Feature in Webmaster Tools that informs you of URLs in your site with duplicate titles or meta descriptions:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80407

MickeyC said…
You should have used the Content-Location header instead, as per:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
“14.14 Content-Location”

@MickeyC: Yes, from a theoretical standpoint that makes sense and we certainly considered it. A few points, however, led us to choose <link rel=”canonical”… />:

1. Our data showed that the “Content-Location” header is configured improperly on many web sites. Sometimes webmasters provide long, ugly URLs that aren’t even duplicates — it’s probably unintentional. They’re likely unaware that their webserver is even sending the Content-Location header.

It would’ve been extremely time consuming to contact site owners to clean up the Content-Location issues throughout the web. We realized that if we started with a clean slate, we could provide the functionality more quickly. With Microsoft and Yahoo! on-board to support this format, webmasters need to only learn one syntax.

2. Often webmasters have difficulty configuring their web server headers, but can more easily change their HTML. rel=”canonical” seemed like a friendly attribute.

Damon said…
What about the case where I have explicit geographic mirrors (closer to the users) with their own subdomains such as mirror-CC.main.dom.ain? Should they point to the ‘master’ copy as canonical, since I’d like you to point them to the closest copy ideally…

@Damon:
If you set the rel=”canonical” for your mirrors to be your master URL, then only the master URL is likely to be returned in search results, regardless of the user’s location. Redirecting the user to their closest mirror site would need to occur on your webserver after clickthrough.

Wade Leftwich said…
And I assume it’s OK for the canonical page to have a ‘link rel=”canonical”‘ pointing to itself?

@Wade: Yes, it’s absolutely okay to have a self-referential rel=”canonical”. It won’t harm the system and additionally, by including a self-reference you better ensure that your mirrors have a rel=”canonical” to you.

Recommend

this content