Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Whats the best way to remove search indexed pages on magento?
-
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)...
I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example:
- http://aqmp.com.br/acessorios/lencos.html
- http://aqmp.com.br/acessorios/lencos.html?mode=grid
- http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name
Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
-
I have tried using them and didn´t do anything - furthermore, if you check this video out by Google themselves, you will find that using these parameters is a "hint/suggestion" as opposed to a solid directive.
http://www.youtube.com/watch?v=DiEYcBZ36poRel Canonical is also a hint.
But Meta Noindex,follow is a solid directive which they have to pay attention to.
Hope that helps - been there, done it got the t shirt through a lot of pain and frustration!
-
What do you think about Google URL parameters? http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
-
Hi Ian,
You are right in that Yoast Meta Robots can be cranky - I installed it and had to play around with it to get it working.
However, it does offer a very nice feature that I think is worth it - you can apply various combinations of Meta Robots directives to product pages individually - so this adds more value than just being able to do NOINDEX on reviews, wishlists, etc... pages. But install it on your dev site before trying it live.
So my solution uses both Yoast and my custom code - you check the URL for any querystrings, such as ?manufacturer etc... and apply different logic according to what you wish to be indexed or not.
Feel free to PM me.
-
Hi,
can you expand on this and point me in the right direction if possible please BJS1976? I have the same problems too as originally asked by 'SEO Martin'.
I have obviously seen that the Yoast_MetaRobots plugin is recommended by others when searching for a solution to noindexing the non-content pages (search results, filters etc). However I am very reluctant to install this as many people who have tried said it has broken their sites.
If there is another way of implementing the noindex, follow meta tag, I would be very greatful to know how as like you I am really struggling to with this one.
Many Thanks
-
Hi,
I am quite familiar with Magento and struggling with the SEO of this ecommerce mammoth!
As far as I am aware, you should implement the meta tag "NOINDEX, FOLLOW" on those pages that you do not want indexed - as your pages are already in the index, this is the way to go - blocking them on robots.txt does not get pages out from the index if they are already in there.
I suggest you apply some "querystring" logic to your template - you will find the page here:
app/design/frontend/default/YOURTEMPLATE/template/page/html/head.phtmlThat way, you can apply the
depending on the page content.
Hope this helps you and let's stay in touch about Magento! (PM me)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do internal search results get indexed by Google?
Hi all, Most of the URLs that are created by using the internal search function of a website/web shop shouldn't be indexed since they create duplicate content or waste crawl budget. The standard way to go is to 'noindex, follow' these pages or sometimes to use robots.txt to disallow crawling of these pages. The first question I have is how these pages actually would get indexed in the first place if you wouldn't use one of the options above. Crawlers follow links to index a website's pages. If a random visitor comes to your site and uses the search function, this creates a URL. There are no links leading to this URL, it is not in a sitemap, it can't be found through navigating on the website,... so how can search engines index these URLs that were generated by using an internal search function? Second question: let's say somebody embeds a link on his website pointing to a URL from your website that was created by an internal search. Now let's assume you used robots.txt to make sure these URLs weren't indexed. This means Google won't even crawl those pages. Is it possible then that the link that was used on another website will show an empty page after a while, since Google doesn't even crawl this page? Thanks for your thoughts guys.
Intermediate & Advanced SEO | | Mat_C0 -
Substantial difference between Number of Indexed Pages and Sitemap Pages
Hey there, I am doing a website audit at the moment. I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise. Total indexed: 2,360 (Search Consule)
Intermediate & Advanced SEO | | Online-Marketing-Guy
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLs Cheers,
Jochen0 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
How long takes to a page show up in Google results after removing noindex from a page?
Hi folks, A client of mine created a new page and used meta robots noindex to not show the page while they are not ready to launch it. The problem is that somehow Google "crawled" the page and now, after removing the meta robots noindex, the page does not show up in the results. We've tried to crawl it using Fetch as Googlebot, and then submit it using the button that appears. We've included the page in sitemap.xml and also used the old Google submit new page URL https://www.google.com/webmasters/tools/submit-url Does anyone know how long will it take for Google to show the page AFTER removing meta robots noindex from the page? Any reliable references of the statement? I did not find any Google video/post about this. I know that in some days it will appear but I'd like to have a good reference for the future. Thanks.
Intermediate & Advanced SEO | | fabioricotta-840380 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Best way to permanently remove URLs from the Google index?
We have several subdomains we use for testing applications. Even if we block with robots.txt, these subdomains still appear to get indexed (though they show as blocked by robots.txt. I've claimed these subdomains and requested permanent removal, but it appears that after a certain time period (6 months)? Google will re-index (and mark them as blocked by robots.txt). What is the best way to permanently remove these from the index? We can't use login to block because our clients want to be able to view these applications without needing to login. What is the next best solution?
Intermediate & Advanced SEO | | nicole.healthline0 -
Best possible linking on site with 100K indexed pages
Hello All, First of all I would like to thank everybody here for sharing such great knowledge with such amazing and heartfelt passion.It really is good to see. Thank you. My story / question: I recently sold a site with more than 100k pages indexed in Google. I was allowed to keep links on the site.These links being actual anchor text links on both the home page as well on the 100k news articles. On top of that, my site syndicates its rss feed (Just links and titles, no content) to this page. However, the new owner made a mess, and now the site could possibly be seen as bad linking to my site. Google tells me within webmasters that this particular site gives me more than 400K backlinks. I have NEVER received one single notice from Google that I have bad links. That first. But, I was worried that this page could have been the reason why MY site tanked as bad as it did. It's the only source linking so massive to me. Just a few days ago, I got in contact with the new site owner. And he has taken my offer to help him 'better' his site. Although getting the site up to date for him is my main purpose, since I am there, I will also put effort in to optimizing the links back to my site. My question: What would be the best to do for my 'most SEO gain' out of this? The site is a news paper type of site, catering for news within the exact niche my site is trying to rank. Difference being, his is a news site, mine is not. It is commercial. Once I fix his site, there will be regular news updates all within the niche we both are in. Regularly as in several times per day. It's news. In the niche. Should I leave my rss feed in the side bars of all the content? Should I leave an achor text link on the sidebar (on all news etc.) If so: there can be just one keyword... 407K pages linking with just 1 kw?? Should I keep it to just one link on the home page? I would love to hear what you guys think. (My domain is from 2001. Like a quality wine. However, still tanked like a submarine.) ALL SEO reports I got here are now Grade A. The site is finally fully optimized. Truly nice to have that confirmation. Now I hope someone will be able to tell me what is best to do, in order to get the most SEO gain out of this for my site. Thank you.
Intermediate & Advanced SEO | | richardo24hr0 -
Should pages of old news articles be indexed?
My website published about 3 news articles a day and is set up so that old news articles can be accessed through a "back" button with articles going to page 2 then page 3 then page 4, etc... as new articles push them down. The pages include a link to the article and a short snippet. I was thinking I would want Google to index the first 3 pages of articles, but after that the pages are not worthwhile. Could these pages harm me and should they be noindexed and/or added as a canonical URL to the main news page - or is leaving them as is fine because they are so deep into the site that Google won't see them, but I also won't be penalized for having week content? Thanks for the help!
Intermediate & Advanced SEO | | theLotter0