Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate Content for index.html
-
In the Crawl Diagnostics Summary, it says that I have two pages with duplicate content which are:
I read in a Dream Weaver tutorial that you should name your home page "index.html" and then you can let www.mywebsite.com automatically direct the user to index.html. Is this a bug in SEOMoz's crawler or is it a real problem with my site?
Thank you,
Dan
-
The code should definitely go into the websites root directory's .htaccess, however .htaccess can be weird, a few days ago I ran into a similar issue with a client's website, and I was able to remedy the issue with a variation of the code.
index Redirect RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)index.(php|html|htm|asp)\ HTTP/ RewriteRule ^(([^/]+/))index.(php|html|htm|asp)$ http://yoursite.com/$1 [R=301,L]
If you give me the URL for the site I will take a look at it and let you know what would be feasible.
-
Hi Daniel, can you share with us the URL of your site? We can take a look at it and give you a more precise answer that way. Thanks!
-
I eventually figured out that your method was a 301 redirect and I definitely broke my site trying to use the code you posted. .. haha. Its ok though. I just removed the code and it went back to normal. At first, I was editing the .htaccess file in the public_html folder which wasnt working. Then I tried the root folder for the site (I created the .htaccess file since it did not exist.) Neither of those worked. (I am using Bluehost so I do not think that I have root access and I am not sure if it is a Linux server or not.)
If there is an easy way to explain what I am doing wrong, please do so. Otherwise, I will use canonical.
Thanks for everything!
-
@Dan
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
sorry about the delay of this response, i didn't realize the that you were asking me a question right away. When placing the code I provided in my previous answer this will cause a 301 perminant redirect to the original URL. That's actually what the
[R=301,L]
portion of the code is stating (R) redirect (301) status is referring to. After reviewing the Matt Cutts video, I realize that I should have asked you if you were operating on a Linux server that you had root access to. We actually utilize both redirects and canonical tags since it was recommended by the on-page optimization reports. Heck Google uses them, I would assume because it's easier for the user to be referred to a single page URL. Obviously though if you don't have server header access, and are not familiar with .htaccess (you can accidentally break your site) then the canonical solution is appropriate
-
Josh,
Thanks for your reply. It seems like there are lots of different ways to solve this problem. I just watched this video on Matt Cutt's blog where he discusses his preference for 301 redirects over rel canonical tag.
Where would you say your solution fits in?
Thanks,
Dan -
use the link rel tag for all my homepages for the http://www.yoursite.com
-
Odd enough I just recently answered this question. The SEOmoz crawler is correct, because without a redirect you will be able to access both versions of the page in your browser.
To resolve this issue simply rewrite the index.html to the root url by placing the following code into your .htaccess file into your root directory.
Options +FollowSymlinks RewriteEngine on
Index Rewrite RewriteRule ^index.(htm|html|php) http://www.yoursite.com/ [R=301,L] RewriteRule ^(.*)/index.(htm|html|php) http://www.yoursite.com/$1/ [R=301,L]
You can also do the same with the index file in any subdirectories that you might create, by simply placing a .htaccess into those sub directories and using variations of the above code. This is how you create nice tight URLs without the duplicate content issue that look like - http://www.semclix.com/design/business/
-
It is a problem which you need to fix. You need to canonicalize your pages.
Those are all various URLs which most likely lead to the same web page. I say "most likely" because these URLs can actually lead to different pages.
You need to tell crawlers and search engines how you organize your site. There are several ways to achieve canonicalization. The method I prefer is to add the following line of code to each page:
The URL provided should be the preferred URL for your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Making html table as 'seofriendly' as possible
Hi, On my website I have a table with a list of products, on every row I have a different product and a different property on each column. The table is made with css so the html code is clean. The problem is (I guess) that google doesn't 'understand' what its inside on the table. So if I do a google search that page appears on the page 87, there is any way to improve my SEO without changing the table? Or to improve my SEO I must change the format of my content? In resume, I want to improve the SEO page of a page that contains information organized inside a table. I don't know if there is a specific answer to this question. Any help is welcome. Regards
Web Design | | jcobo0 -
Bing Indexation and handling of X-ROBOTS tag or AngularJS
Hi MozCommunity, I have been tearing my hair out trying to figure out why BING wont index a test site we're running. We're in the midst of upgrading one of our sites from archaic technology and infrastructure to a fully responsive version.
Web Design | | AU-SEO
This new site is a fully AngularJS driven site. There's currently over 2 million pages and as we're developing the new site in the backend, we would like to test out the tech with Google and Bing. We're looking at a pre-render option to be able to create static HTML snapshots of the pages that we care about the most and will be available on the sitemap.xml.gz However, with 3 completely static HTML control pages established, where we had a page with no robots metatag on the page, one with the robots NOINDEX metatag in the head section and one with a dynamic header (X-ROBOTS meta) on a third page with the NOINDEX directive as well. We expected the one without the meta tag to at least get indexed along with the homepage of the test site. In addition to those 3 control pages, we had 3 pages where we had an internal search results page with the dynamic NOINDEX header. A listing page with no such header and the homepage with no such header. With Google, the correct indexation occured with only 3 pages being indexed, being the homepage, the listing page and the control page without the metatag. However, with BING, there's nothing. No page indexed at all. Not even the flat static HTML page without any robots directive. I have a valid sitemap.xml file and a robots.txt directive open to all engines across all pages yet, nothing. I used the fetch as Bingbot tool, the SEO analyzer Tool and the Preview Page Tool within Bing Webmaster Tools, and they all show a preview of the requested pages. Including the ones with the dynamic header asking it not to index those pages. I'm stumped. I don't know what to do next to understand if BING can accurately process dynamic headers or AngularJS content. Upon checking BWT, there's definitely been crawl activity since it marked against the XML sitemap as successful and put a 4 next to the number of crawled pages. Still no result when running a site: command though. Google responded perfectly and understood exactly which pages to index and crawl. Anyone else used dynamic headers or AngularJS that might be able to chime in perhaps with running similar tests? Thanks in advance for your assistance....0 -
Spanish website indexed in English, redirect to spanish or english version if i do a new website design?
Hi MOZ users, i have this problem. We have a website in Spanish Language but Google crawls it on English (it is not important the reasons). We re made the entire website and now we are planning the move. The new website will have different language versions, english, spanish and portuguese. Somebody tells me that we have to redirect the old urls (crawled on english) to the new english versions, not to the spanish (the real language of the firsts). Example: URL1 Language: Spanish - Crawled on English --> redirect to Language English version. the other option will be redirect to the spanish new version, which the visitor is waiting to find. URL1 Language: Spanish - Crawled on English --> redirect to Language Spanish version. What do you think? Which is the better option?
Web Design | | NachoRetta0 -
Fixing Render Blocking Javascript and CSS in the Above-the-fold content
We don't have a responsive design site yet, and our mobile site is built through Dudamobile. I know it's not the best, but I'm trying to do whatever we can until we get around to redesigning it. Is there anything I can do about the following Page Speed Insight errors or are they just a function of using Dudamobile? Eliminate render-blocking JavaScript and CSS in above-the-fold content Your page has 3 blocking script resources and 5 blocking CSS resources. This causes a delay in rendering your page.None of the above-the-fold content on your page could be rendered without waiting for the following resources to load. Try to defer or asynchronously load blocking resources, or inline the critical portions of those resources directly in the HTML.Remove render-blocking JavaScript: http://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js http://mobile.dudamobile.com/…ckage.min.js?version=2015-04-02T13:36:04 http://mobile.dudamobile.com/…pts/blogs.js?version=2015-04-02T13:36:04 Optimize CSS Delivery of the following: http://fonts.googleapis.com/…:400|Great+Vibes|Signika:400,300,600,700 http://mobile.dudamobile.com/…ont-pack.css?version=2015-04-02T13:36:04 http://mobile.dudamobile.com/…kage.min.css?version=2015-04-02T13:36:04 http://irp-cdn.multiscreensite.com/kempruge/files/kempruge_0.min.css?v=6 http://irp-cdn.multiscreensite.com/…mpruge/files/kempruge_home_0.min.css?v=6 Thanks for any tips, Ruben
Web Design | | KempRugeLawGroup0 -
Is it cloaking/hiding text if textual content is no longer accessible for mobile visitors on responsive webpages?
My company is implementing a responsive design for our website to better serve our mobile customers. However, when I reviewed the wireframes of the work our development company is doing, it became clear to me that, for many of our pages, large parts of the textual content on the page, and most of our sidebar links, would no longer be accessible to a visitor using a mobile device. The content will still be indexable, but hidden from users using media queries. There would be no access point for a user to view much of the content on the page that's making it rank. This is not my understanding of best practices around responsive design. My interpretation of Google's guidelines on responsive design is that all of the content is served to both users and search engines, but displayed in a more accessible way to a user depending on their mobile device. For example, Wikipedia pages have introductory content, but hide most of the detailed info in tabs. All of the information is still there and accessible to a user...but you don't have to scroll through as much to get to what you want. To me, what our development company is proposing fits the definition of cloaking and/or hiding text and links - we'd be making available different content to search engines than users, and it seems to me that there's considerable risk to their interpretation of responsive design. I'm wondering what other people in the Moz community think about this - and whether anyone out there has any experience to share about inaccessable content on responsive webpages, and the SEO impact of this. Thank you!
Web Design | | mmewdell0 -
ECWID How to fix Duplicate page content and external link issue
I am working on a site that has a HUGE number of duplicate pages due to ECWID ecommerce platform. The site is built with Joomla! How can I rectify this situation? The pages also show up as "external " links on crawls... Is it the ECWID platform? I have never worked on a site that uses this. Here is an example of a page with the issue (there are 6280 issues) URL: http://www.metroboltmi.com/shop-spare-parts?Itemid=218&option=com_rokecwid&view=ecwid&ecwid_category_id=3560081
Web Design | | Atlanta-SMO0 -
Subdomains, duplicate content and microsites
I work for a website that generates a high amount of unique, quality content. This website though has had development issues with our web builder and they are going to separate the site into different subdomains upon launch. It's a scholarly site so the subdomains will be like history and science and stuff. Don't ask why aren't we aren't using subdirectories because trust me I wish we could. So we have to use subdomains and I'm wondering a couple questions. Will the duplication of coding, since all subdomains will have the same design and look, heavily penalize us and is there any way around that? Also if we generate a good amount of high quality content on each site could we link all those sites to our other site as a possible benefit for link building? And finally, would footer links, linking all the subdirectories, be a good thing to put in?
Web Design | | mdorville0 -
URLs with Hashtags - Does Google Index Them?
Hi there, I have a potential issue with a site whereby all pages are dynamically populated using Javascript. Thus, an example of an URL on their site would be www.example.com/#!/category/product. I have read lots of conflicting information on the web - some says Google will ignore everything after the hashtag; other people say that Google will now index everything after the hashtag. Does anybody have any conclusive information about this? Any links to Google or Matt Cutts as confirmation would be brilliant. P.S. I am aware about the potential issue of duplicate content, but I can assure you that has been dealt with. I am only concerned about whether Google will index full URLs that contain hashtags. Thanks all! Mark
Web Design | | markadoi840