Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
OK to block /js/ folder using robots.txt?
-
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU)
But what if you have lots and lots of JS and you dont want to waste precious crawl resources?
Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc.
And the legacy versions show up in Google Webmaster Tools as 404s. For example:
http://www.discoverafrica.com/js/global_functions.js?v=1.1
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether?
Isn't that what robots.txt was made for?
Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks.
We're just trying to power our content and UX elegantly with javascript.
What do you guys say:
Obey Matt? Or run the javascript gauntlet?
-
Hey!
So, I listened to Matt's video. I see his point about wanting to crawl the JS files just in case something tricky is going on. Do understand that this is a risk you take. I don't see an issue blocking crawling of those files from a logical perspective, but if you or someone that takes over for you in the future does do something sneaky with JS and you are caught ... plus you have blacked access to the offending files ... it is going to take a lot more work to get back in good graces with them.
It's like a cop searching your car. You have every right to ban them from doing so, but if you have nothing to hide, why make trouble? Matt is right, banning crawling of these files is not going to save you much but if you think it's an issue, feel free. Just know that they might take it as a possible flag in the future.
Kate
-
Harald, it looks like the response you've quoted is from http://groups.google.com/a/googleproductforums.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/9MGYEoROdkg, which is a question about a menu that has javascript. I think this poster has a slightly different question. I'll ask another associate to come on in and take a look.
-
Hi Discover,I think that whenever we access the web pages , we have seen number of times that there is run time error & they asking for debug. This error message is helpful for the developers only but not for the users.
I think that you should please refer to the following link:
The truth about non javascript
I hope that above content help to solve your query.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO advice on ecommerce url structure where categories contain "/c/"
Hi! We use Hybris as plattform and I would like input on which url to choose. We must keep "/c/" before the actual category. c stands for category. I.e. this current url format will be shortened and cleaned:
Technical SEO | | hampgunn
https://www.granngarden.se/Sortiment/Husdjur/Hund/Hundfoder-%26-Hundmat/c/hundfoder To either: a.
https://www.granngarden.se/husdjur/hund/hundfoder/c/hundfoder b.
https://www.granngarden.se/husdjur/hund/c/hundfoder (hundfoder means dogfood) The question is whether we should keep the duplicated category name (hundfoder) before the "/c/" or not. Will there be SEO disadvantages by removing the duplicate "hundfoder" before the "/c/"? I prefer the shorter version ofc, but do not want to jeopardize any SEO rankings or send confusing signals to search engines or customers due to the "/c/" breaking up the url breadcrumb. What do you guys say and prefer from the above alternatives? Thanks /Hampus0 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
Cgi-bin folder
Is there cg-bin folder of any site related to SEO in anyways and why we always block Cgi-bin folder in robots.txt.
Technical SEO | | Alick3000 -
MBG Tracker...how to use it?
So I am a new blogger that has been submitting guest blog posts to a number of different blogs. It was recommended that I use the MBG Tracker so I can track the back links. The problem is that I am totally lost on how to use this tool. As I said before I am new to this whole thing and I am not really sure what constitutes a "base link" and a "back link." In the author bylines we are linking to different pages within a larger website. If anyone can help me I would really appreciate it!
Technical SEO | | Stroll0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050