It’s a colossal question right? Will or will not duplicate content hurt SEO rankings? If yes, how does it?
Look at this keyword ranking report, and you’ll notice Google is telling the webmaster – “I”m terribly confused … Help me understand what’s going on here …”. Meaning, not knowing which url to pick between HTTP and HTTPs, Google has started to demote the ranking of the keyword (blurred on purpose below) in question.
- Feb 8: For a keyword (grayed out below), the website is ranking at 17th position on Google
- Mar 22: After a SEO campaign, after 30+ days there is some upward keyword movement. Rank has improved to 10. First page, yay!
- Apr 05: All of a sudden, 2 urls are ranking in Google. The one with HTTP is down below at 50+ position, where as there is one with HTTPS on the 10th position.
- Apr 26: After a lots of confusion for Google, where the algorithm is trying to determine which urls to rank, it’s demoting the one with HTTPS to 23. HTTP url is practically dead.
In this case, this particular webmaster implemented HTTPS without regard to SEO inadvertently creating 2 copies of its entire website, first set of which is being served with HTTP and another one with HTTPS. This has created a duplicate content for Search Engines. As the exact same content is now available in 2 different urls, Google does not “know” which url should it serve to it’s searchers. Any link juice is also split into 2 different urls making both pages “weaker”.
Clearly, from timeline perspective, Google did not punish the site right away. In above case, Google took about 3 weeks to have the keyword drop rankings from 10 to 23.
What’s the solution?
For background information
Though, HTTPS is “preferred”, see my previous blogs on risks of HTTPS implementation from SEO perspective. Consider, HTTPS as a tie-breaking ranking factor between 2 equally qualified webpages.
The solution in this case it to tell Google, HTTP and HTTPS are duplicate content and to treat those pages as a single page. There are multiple ways to do this:
- With robots – noindex, nofollow
Let’s start with Grainger.com homepage and review its source code.
- Visit https://www.grainger.com, and check view source and note <meta name=”robots” content=”noindex,nofollow” /> making the page NOT indexed.
<meta name="robots" content="noindex, nofollow">
- However, let’s go to http://www.grainger.com, and let’s check the view source. This page is indexed and followed.Thus by having HTTPS version not followed nor indexed, Grainger has removed possibility of duplicate content. Nicely done.
- With 301 Redirect on the server side
Next, let’s look at Amazon.com. With server side response, Amazon is forcing a single form of page.
e.g. Visit http://www.amazon.com, where the server header response code is “HTTP 200 OK”. Next, let’s visit https://www.amazon.com, and you will notice the server header response code has changed to “HTTP 301 Redirect” to the HTTP version of the homepage.Thus by having server side redirect and forcing a specific version, Amazon has removed possibility of duplicate content.
- With rel=canonical
The third option is to tell Google which url, it should consider to rank. Note, rel=canonical is suggestive only, and google does not need to follow it as an authority as above 2 options. In this case, following solution would prevent duplicate content from taking place.
First step here is to separate out which pages need to be secure (with HTTPS) and which one non-secure (in HTTP). For example, login, password and any credit card related pages must be secured in HTTPS. While rest of the product pages could be in HTTP. Once that is established, for all non-secure pages, if the url is viewed in HTTPS, make sure to add rel=canonical to HTTP url.
On HTTP pages:
So on https//www.examples.com/url-is-here/, following line of HTML need to be added within the <head> … </head> section.
<link rel="canonical" href="http://www.examples.com/url-is-here/" />Next, on HTTPS
<link rel="canonical" href="http://www.examples.com/url-is-here/" />
This is it folks. With these steps, webmasters can avoid having duplicate content create ranking and traffic mishap on their websites. Which option is better would depend on your website structure and knowledge/comfort level of your web developers.
Bottom line is avoid having Google ask how many John Smiths really? There should be just 1 John Smith for 1 John Smith. Period.