A guide to fixing duplicate content & URL issues on Apache
Published January 13th, 2007 in Online Marketing, Search Engines, TechnologyTags: apache, duplicate content, url rewriting.
Recently, I came across a fantabolic article on fixing duplicate content & URL issues on Apace. I was not able to stop myself from blogging it.
Recently, we’ve had a lot of discussion about domain and URL canonicalization, mainly centered around avoiding duplicate-content problems in Google. There has also been some discussion of fixing type-in URLs, typos in inbound links, and badly-coded inbound links.
To be clear, a “canonical” domain is the single domain you want your site to be known by, and a canonical URL is the single URL you want your page to be known by. Any others are non-canonical.
The word canonical is a religion-related term, and means “according to canon law, scripture or doctrine.” But in general use, it just means “usual, standard, conventional, customary, or according to the rules.” So as a Webmaster, you choose what single domain you want to use for your site, and what single URL should be used to request each of your pages.
Member g1smd has posted in several of these threads the very good advice that it’s best to avoid “stacked redirects” –multiple redirects invoked by a single client request– while doing things like index page and domain canonicalization. This was reiterated recently by WebmasterWorld admin tedster in this recent thread.
I have coded various routines to do these kinds of fix-ups on an ad-hoc basis, but have never actually written a single-redirect-does-it-all solution. Actually, that’s not quite true — I had *tried* before, but a nasty mod_rewrite bug in Apache 1.3.x had repeatedly stymied my efforts.
However, after returning to the subject after almost a year, and having spent that year experimenting and dashing off code in the WebmasterWorld Apache forum, one trick I had figured out is a work-around for the bug.
So I set out anew to create a domain/URL canonicalizaton and type-in fixup routine that would do the following:
|
|

No Responses to “A guide to fixing duplicate content & URL issues on Apache”
Please Wait
Leave a Reply