Canonical tags guide: The complete one

Canonical tags guide: The complete one

One of the most complete online guides you can find about URL canonicalization and duplicate content management in SEO. What it is, what it's for, best practices, common mistakes and frequently asked questions. Add it to your favorites as a reference guide.


Antonio Salinero

Fri October 14, 2022

Table of contents
  1. What is the canonical tag and what is it for?
  2. How does Google handle duplicate content?
  3. What is a canonical URL?
  4. How does Google choose the canonical URL?
  5. How to specify that a URL is canonical?
  6. Best practices
    1. The canonical URL must be crawlable and indexable
    2. Add the canonical tag to all duplicate versions (URLs) of the page
    3. Use absolute rather than relative paths to indicate the canonical URL
    4. Use lowercase URLs (or uppercase, but be consistent)
    5. Use self-referencing canonical tags
    6. Check dynamic canonical tags
    7. Avoid inconsistent canonicalization signals
  7. Common mistakes
    1. Inserting more than one canonical tag
    2. Inserting the canonical tag in the <body>
    3. Pointing canonical to the first page in paginated series
    4. Adding canonical tag to a page blocked by robots.txt
    5. Pointing the canonical tag to a non-crawlable or non-indexable URL
    6. Including non-canonical URLs in the sitemap
  8. Frequently asked questions
    1. Do canonical tags pass PageRank?
    2. Why does Google ignore the canonical tag?
    3. How can I check which URL Google has selected as canonical?
    4. What is a self-referencing canonical tag?
    5. Is a 301 redirect or a canonical tag better for handling duplicate content?
    6. Can I combine a canonical tag and a noindex directive?
    7. Are versions of the same page in different languages considered duplicate content?
    8. How to set the canonical URL in files that are not HTML pages (such as PDFs)?
    9. When should I canonicalize URLs?
    10. What types of pages should I canonicalize to a different URL?
  9. Sources

What is the canonical tag and what is it for?

The canonical tag is an HTML tag that serves to indicate to search engines which URL is the most representative of a set of duplicate (or very similar) pages so that it is the one that appears in search results and consolidates into it the authority of the rest of the duplicate URLs.

How does Google handle duplicate content?

Googlebot, Google's web crawler, periodically crawls websites looking for URLs to index and include in its search results. If a site has duplicate pages, Google cannot index all of them, as that would fill its index with duplicate results, negatively affecting the user's search experience and wasting resources unnecessarily. In fact, duplicate content also hurts the website's crawl budget, because Googlebot may waste time crawling duplicate versions of the same page, instead of discovering other important content.

To avoid this, Google detects duplicate content quite effectively and, in case it finds duplicate versions of a page, it will choose one of them as canonical, usually the most complete one. However, sometimes Google may consider the wrong URL as canonical, in which case an unwanted URL may be indexed. To help Google select the right URL, it is possible to explicitly indicate the canonical URL of a page.

What is a canonical URL?

If we have a page available through different URLs (duplicate content), the canonical URL is the most representative or main URL, the one that will have the authority, the one that will be crawled more frequently, will be included in the search results and will be taken as a reference to evaluate the content and quality, while the rest of URLs will be considered duplicates and, therefore, will be crawled less frequently.

In any case, this does not mean that URLs considered as duplicates cannot appear in search results, because, although Google usually includes canonical URLs in its results, if a duplicate is better suited to the user's search query, it could appear in the SERP instead.

How does Google choose the canonical URL?

Google uses a number of signals or factors to select the canonical URL from among the different duplicate versions of a page:

  • If the URL includes the rel="canonical" tag.
  • If the canonical URL is very similar or the same as the canonicalized page.
  • If the URL is served via HTTPS versus HTTP.
  • If the URL is included in the sitemap.
  • If the internal linking points to that URL.
  • If duplicate URLs redirect to that URL.
  • If the URL is more complete or of better quality than the others.
  • If the URL is "nicer-looking" (SEO-friendly, without parameters, short, ...).

Google will select as canonical the URL with the most concurrent signals, so it is essential to ensure that all these preferences or signals are used consistently throughout the website.

That said, keep in mind that, although it is possible to explicitly indicate which URL you want to be canonical through the rel="canonical" tag, this tag is not a directive, but just one of the many signals that Google uses and, therefore, if the other signals indicate something else, Google could select a different URL.

How to specify that a URL is canonical?

To set the canonical URL of a page just add inside the of the page the following tag. The value of the href attribute will point to the URL that we want to be the canonical URL.

Syntax:

<head>
    […]
    <link rel="canonical" href="https://perseo.pro/" />
    […]
</head>

Best practices

The canonical URL must be crawlable and indexable

The canonical URL must be:

  • Crawlable, i.e. not blocked via robots.txt.
  • Indexable, i.e. return a 200 OK HTTP response code and not contain a robots "noindex" or "none" directive.

Add the canonical tag to all duplicate versions (URLs) of the page

They should all point to the same preferred URL.

Use absolute rather than relative paths to indicate the canonical URL

Canonical tags based on relative URLs may mistakenly refer to another domain that does not correspond to the canonical version.

Recommended https://perseo.pro/features
Not recommended /features
Wrong perseo.pro/features


Avoid inconsistent signals. In this way, you will prevent search engines from choosing a different canonical than the desired one.

Recommended Page A Page A
Recommended Page A Page B
Not recommended Page A Page B Page A
Not recommended Page A Page B Page C
Wrong Page A Page B →301→ Page A

Use lowercase URLs (or uppercase, but be consistent)

URLs are case-sensitive, so variations in case can make one URL different from another. If you configure the server to use lowercase URLs and set only lowercase canonical URLs, you will avoid potential duplicate content problems due to capitalization.

Use self-referencing canonical tags

This basically consists of pointing the canonical tag to the URL itself. This is generally a good practice. Since you don't know how people will link to your pages, it can help prevent unforeseen SEO errors.

For example, if you had a page "page.html" and a third party linked to it pointing to "page.html?order=asc", if that page had no canonical tag, search engines might select that URL with parameter as canonical. However, if the page had the canonical tag "page.html" ( self-referencing), it would motivate search engines to select the original URL without parameter.

Check dynamic canonical tags

Make sure that your CMS does not adds a different canonical tag for multiple duplicate versions of the same page.

URL
 
Canonical
Not recommended
Canonical
Recommended
/page https://example.com/page https://example.com/page
/page?order=asc https://example.com/page?order=asc https://example.com/page
/page?order=desc https://example.com/page?order=desc https://example.com/page

Avoid inconsistent canonicalization signals

Make sure that all the signals that Google uses to select the canonical URL of a page (HTTPS, canonical, internal linking, sitemap, redirects, etc.) are consistent and do not contradict each other. This will make it more likely that Google will select the expected URL as the canonical URL.

Common mistakes

Inserting more than one canonical tag

If the page has multiple canonical tags, Google will ignore all of them and therefore any effect they may have.

Inserting the canonical tag in the <body>

The canonical tag should always be added within the <head> of the page. In addition, to avoid possible parsing errors, it is recommended to place it at the beginning of the <head>, as early as possible. Any canonical placed within the <body> will be ignored.

Sometimes, improperly closed tags within the <head>, especially when scripts are included, can cause the search engine to not be able to detect subsequent tags, such as canonical, hreflang, etc.

Pointing canonical to the first page in paginated series

Pointing the canonical tag of pages 2 and later to the first page of a paginated series is incorrect and not recommended. Doing so would mean that the content of those pages would not be indexed, unless the other signals that Google takes into account cause it to ignore the canonical tag.

In addition, we must keep in mind that the content of each individual page is different from the other pages in the series, so it is not duplicate content and the use of the canonical tag to consolidate authority in a single URL would not make sense.

  /list.html /list.html?p=2 /list.html?p=3
Canonical Recommended /list.html /list.html?p=2 /list.html?p=3
Canonical Not recommended /list.html /list.html /list.html

Adding canonical tag to a page blocked by robots.txt

If a page is blocked by robots.txt, search engines won't see the canonical tag because they won't crawl the page and therefore it will have no effect.

Pointing the canonical tag to a non-crawlable or non-indexable URL

Canonical URLs should always be crawlable and indexable. If you point the canonical tag to a page that is not crawlable and indexable, you would be sending contradictory signals to the search engines, which could result in neither the page with the canonical tag nor the canonical URL being indexed.

If that happens, it would be necessary to replace the URL of the tag or change the crawlability/indexability of the canonical URL.

Non-crawlable URL
A URL is not crawlable if it is blocked by robots.txt. However, keep in mind that, if the URL is indexable, it could be indexed if it is linked. In any case, a canonical URL should be crawlable to serve its purpose.

Non-indexable URL
A URL is not indexable if:

  • It is marked as noindex.
  • It is canonicalized to another URL.
  • It doesn't return a 200 OK status code (3XX, 4XX, 5XX, ...).

Including non-canonical URLs in the sitemap

Only canonical URLs should be included in the sitemap.

The purpose of a sitemap is to help search engines crawl the website by providing a list of URLs that you want to be indexed. Thus, a non-canonical URL should not be included in the sitemap because, theoretically, it is a duplicate page that has another more representative URL (its canonical URL) that will be the one that should be indexed instead and, therefore, be included in the sitemap.

Frequently asked questions

Do canonical tags pass PageRank?

Canonical tags help consolidate the link equity (PageRank) of all duplicate pages on the main and canonical page.

Duplicate pages may receive backlinks from various external sources, partially taking over the authority of the canonical version, which is the one that should rank in search results.

The implementation of canonical tags on duplicate pages allows to transfer the PageRank into the canonical URL, concentrating all the authority on it.

Practically speaking: If page A is canonicalized to page B, the links pointing to page A will count as links to page B.

Why does Google ignore the canonical tag?

Google checks a number of signals to select the canonical URL. The canonical tag is just one of those signals, but it is only a suggestion. Google will choose the one that most closely matches the signals. Therefore, you should check if all those factors that Google considers are consistent on your site and match the URL indicated in the canonical tag. That is:

  • If the URL includes the rel="canonical" tag.
  • If the canonical URL is very similar or the same as the canonicalized page.
  • If the URL is served via HTTPS versus HTTP.
  • If the URL is included in the sitemap.
  • If the internal linking points to that URL.
  • If duplicate URLs redirect to that URL.
  • If the URL is more complete or of better quality than the others.
  • If the URL is "nicer-looking" (SEO-friendly, without parameters, short, ...).

If not, you should consider making changes to your website to ensure that the other signals that Google takes into account also point to the same URL as the canonical tag.

How can I check which URL Google has selected as canonical?

You can use the URL inspection tool in Google Search Console to find out which page Google considers canonical.

What is a self-referencing canonical tag?

A self-referencing (aka self-referential) canonical tag is a tag that is canonicalized to itself. This ensures that multiple versions of the page (duplicates) are not indexed separately. Its implementation is not critical, but it is a good practice.

https://www.example.com/page.html
<link rel="canonical" href="https://www.example.com/page.html" />

Is a 301 redirect or a canonical tag better for handling duplicate content?

Although both strategies allow us to make search engines understand which is the canonical URL of a page, the use of canonical tags sends a weaker signal than a 301 redirect: A canonical is a suggestion and a 301 redirect is a directive.

However, the canonical tag is often the preferred method for handling duplicate content. This is because of the fundamental difference between the two:

If A redirects to B, search engines will know that B is the canonical URL, but visits to A will be automatically redirected to B and it will no longer be possible to access A. However, if you point the canonical tag from A to B, search engines will still know that B is the canonical URL, but, in this case, it will be possible to visit both URLs: A and B. Thus, the use of one method or the other will depend on the desired result.

Can I combine a canonical tag and a noindex directive?

In general, this is usually not a good idea. The canonical tag is used to consolidate the authority of duplicate URLs into one main URL to be displayed in search results. If you add a noindex directive, you are telling search engines that you don't want the URL to be indexed. In this way, you would be sending mixed signals.

In the words of John Mueller: "You shouldn’t mix noindex and rel=canonical as they’re very contradictory pieces of information for us. We’ll generally pick the rel=canonical and use that over the noindex.".

However, we must distinguish between two cases:

  • Noindex + canonical to another page. In this case, there is a problem because conflicting signals are sent. The noindex can be transferred to the canonical URL and not be indexed.

  • Noindex + self-referencing canonical. In this case, there is no contradiction. The canonical self-reference tells search engines that the current URL is the main URL and the noindex tells search engines not to index it. So, the result would be the same as having the noindex without canonical.

Are versions of the same page in different languages considered duplicate content?

No. If their main content is not in the same language, it will not be considered duplicate content.

How to set the canonical URL in files that are not HTML pages (such as PDFs)?

The canonical tag can only be implemented within HTML pages, however, sometimes it may be necessary to include it in non-HTML files, such as PDFs. In these cases, the HTTP rel="canonical" header can be used instead.

Syntax:

Link: <http://www.example.com/downloads/file.pdf>; rel="canonical"

The applicable recommendations are the same as for the canonical tag. According to the RFC2616 specification, use only double quotes.

When should I canonicalize URLs?

  • Duplicate or very similar content. When the content of several URLs is totally duplicated or very similar, you should canonicalize them to a single URL.
  • Same search intent. You might want to do this if the content serves the same search intent. For example, you publish an article that updates another from 5 years ago, and you want both to be available, but only the new version to have the authority.
  • Unavailable content with very similar alternative. If we have a content (product, service, article, etc.) that is no longer available and we have another alternative URL with very similar content that fits the same search intent.

What types of pages should I canonicalize to a different URL?

There are legitimate uses for having duplicate, or very similar, pages on different URLs that could be canonicalized. Some of them are:

  • Versions for different types of devices.
    https://example.com/page.html
    https://m.example.com/page.html
    https://amp.example.com/page.html

  • Dynamic URLs with parameters passive, search, faceting, session, tracking, etc.
    https://example.com/search?q=sneakers
    https://example.com/products?size=12&color=red
    https://example.com/sneakers/nike?gclid=ABCD
    https://example.com/page.php?PHPSESSID=b12312231231231231
    https://example.com/sneakers/red/nike-red-sneakers.html

  • Products or articles associated to several categories.
    https://example.com/blog/news/spring-summer-fashion-trends
    https://example.com/blog/lifestyle/spring-summer-fashion-trends

  • URLs that respond with the same content for different protocols, ports or domains.
    http://example.com/page.html
    https://example.com/page.html
    http://www.example.com/page.html
    http://example.com:80/page.html
    https://example.com:443/page.html

  • Index pages.
    http://example.com/
    http://example.com/index.htm
    http://example.com/index.html
    http://example.com/index.php
    http://example.com/default.html

  • URLs with and without trailing slash.
    https://example.com/page
    https://example.com/page/

  • Case-insensitive URLs.
    https://example.com/page
    https://example.com/Page
    https://example.com/PAGE

  • Syndicated content. Total or partial replication of articles on other news websites.
    https://example.com/blog/news/spring-summer-fashion-trends (original post)
    https://news.example.com/spring-summer-fashion-trends-123456.html (syndicated post)

Sources