Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks
-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No index detected in robots meta tag GSC issue_Help Please
Hi Everyone, We just did a site migration ( URL structure change, site redesign, CMS change). During migration, dev team messed up badly on a few things including SEO. The old site had pages canonicalized and self canonicalized <> New site doesn't have anything (CMS dev error) so we are working retroactively to add canonicalization mechanism The legacy site had URL’s ending with a trailing slash “/” <> new site got redirected to Set of url’s without “/” New site action : All robots are allowed: A new sitemap is submitted to google search console So here is my problem (it been a long 24hr night for me 🙂 ) 1. Now when I look at GSC homepage URL it says that old page is self canonicalized and currently in index (old page with a trailing slash at the end of URL). 2. When I try to perform a live URL test, I get the message "No: 'noindex' detected in 'robots' meta tag" , so indexation cant be done. I have no idea where noindex is coming from. 3. Robots.txt in search console still showing old file ( no noindex there ) I tried to submit new file but old one still coming up. When I click on "See live robots.txt" I get current robots. 4. I see that old page is still canonicalized and attempting to index redirected old page might be confusing google Hope someone can help to get the new page indexed! I really need it 🙂 Please ping me if you need more clarification. Thank you ! Thank you
Intermediate & Advanced SEO | | bgvsiteadmin1 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Using disavow tool for 404s
Hey Community, Got a question about the disavow tool for you. My site is getting thousands of 404 errors from old blog/coupon/you name it sites linking to our old URL structure (which used underscores and ended in .jsp). It seems like the webmasters of these sites aren't answering back or haven't updated their sites in ages so it's returning 404 errors. If I disavow these domains and/or links will it clear out these 404 errors in Google? I read the GWT help page on it, but it didn't seem to answer this question. Feel free to ask any questions that may help you understand the issue more. Thanks for your help,
Intermediate & Advanced SEO | | IceIcebaby
-Reed0 -
Meta Keywords Good or Bad
Hi All, I've been reading more about the meta keyword tag and why it may not be a good idea to include them on pages and am looking for thoughts/feedback on this idea. If you have employed this tactic, can you give me some insight into any results you saw. If you decided to not employ this tactic, why did you choose not to? I wan to understand all sides of this before employing any changes to my company's websites. Thank you for your help!
Intermediate & Advanced SEO | | airnwater0 -
Best way to noindex an image?
Hi all, A client wanted a few pages noindexed, which was no problem using the meta robots noindex tag. However they now want associated images removed, some of which still appear on pages that they still want indexed. I added the images to their robots.txt file a few weeks ago (probably over a month ago actually) but they're all still showing when you do an image search. What's the best way to noindex them for good, and how do I go about implementing it? Many thanks, Steve
Intermediate & Advanced SEO | | steviephil0 -
All In One SEO PACK Configuration - Index or Noindex?
I'm finding conflicting information about the right way to configure the All in One SEO Pack wordpress plugin. Do I index or noindex for the items below? Use noindex for Categories - yes or no? Use noindex for Archives - yes or no? Use noindex for Tag Archives - yes or no?
Intermediate & Advanced SEO | | webestate0 -
How do you implement dynamic SEO-friendly URLs using Ajax without using hashbangs?
We're building a new website platform and are using Ajax as the method for allowing users to select from filters. We want to dynamically insert elements into the URL as the filters are selected so that search engines will index multiple combinations of filters. We're struggling to see how this is possible using symfony framework. We've used www.gizmodo.com as an example of how to achieve SEO and user-friendly URLs but this is only an example of achieving this for static content. We would prefer to go down a route that didn't involve hashbangs if possible. Does anyone have any experience using hashbangs and how it affected their site? Any advice on the above would be gratefully received.
Intermediate & Advanced SEO | | Sayers1 -
Canonical & noindex? Use together
For duplicate pages created by the "print" function, seomoz says its better to use noindex (http://www.seomoz.org/blog/complete-guide-to-rel-canonical-how-to-and-why-not) and JohnMu says its better to use canonical http://www.google.com/support/forum/p/Webmasters/thread?tid=6c18b666a552585d&hl=en What do you think?
Intermediate & Advanced SEO | | nicole.healthline1