What are the most common mistakes causing duplicate content ?
Duplicate content is an issue affecting most of the websites (about 60% of them). It is possible to talk about duplication when a content is indexable under various URLs.
For example, the homepage of a website is often accessible in the following 2 ways:
domain.com/index.php (or /homepage…)
All the pages of a website are also regularly duplicated through the absence (or the addition) of the subdomain www.
The security of websites ("https") that Google has been clamouring for a long time can also, in spite of you, strengthen the duplicate content.
Ex: I have a website that I declared https://www. If I have not set up redirections to this main URL, it will also be accessible in https (without the "www") and/or in http (with or without the "www".). So instead of offering the search engines 1 clean and unique website, you could(in spite of you) offer them up to 4 times the same one. You shot yourself in the foot but don't' worry, you're not the only one.
The third most common issue is the indexing of the website through all the (TLD) extensions you bought:
Once again, you must redirect (using 301 redirect also called "permanent redirect") all the associated domain names to your main domain in order to avoid duplicates.
Ex: I chose a domainname.fr to be visible to French people and to protect my brand, I bought the domain name in .com, .net(and/or for example my brand-city ou country.fr).
It is essential to redirect ALL associated domain name to your main domain name using 301 redirect. So, if you're still with us, we're talking about the canonical domain name, which means, in general, with or without the "www".
Other common issues are related to the Google indexing of the pre-production/acceptance phase version (test and acceptance before going online) or URLs containing tracking parameters. Beware then, you should prefer a pre-production version that is only accessible by login/password and therefore, invisible from Google. A competent provider will think about it when a bad one won't.
Finally, internal links that are not consistent across the whole website is another serious and pretty pernicious mistake. So don't link for the sake of linking, and don't start with thai food to end up with bowling balls. You have to segment your contents carefully. Moreover, you should avoid broken links (404) or other issues that are not well received by search engines.
In that respect, many URLs could be displayed in order to access an E-commerce product description, for example(pay attention to some CMS).
Web agencies are rarely aware or even sensible to these subtle and time-consuming issues. SEO is rarely a fascinating topic for developers (« Oh actually, performance is not my concern... » « I've got better things to do than worrying about Google...»).
As a consequence, it is important to be careful even if, according to Google, duplicate content is not a major issue...or at least, not a penalty problematic unlike the legend says.