Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Wordpress Comments Pagination
Hi Mozzers What is your view on the following. Should you Paginate comments to increase page speed? If yes, at what # of comments would you begin pagination? (with the objective being decreasing page load times) Apply rel="canonical" back to the main article URL? eg: url/comment-page-1 => url noindex the comment pages? create a "View all" comments page? Thanks in advance for your help! 🙂
Intermediate & Advanced SEO | | jeremycabral
J0 -
Dilemma about "images" folder in robots.txt
Hi, Hope you're doing well. I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
Intermediate & Advanced SEO | | Modbargains1 -
Robots Disallow Backslash - Is it right command
Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,
Intermediate & Advanced SEO | | Modi0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Should I remove the ?replytocom variables in wordpress?
I'm using Yoast's wordpress plugin and there is an option to remove the replytocom variables. I'm curious what everyone's thoughts were on that, and if I should do it. Here's the site if you need to see it. Thanks!
Intermediate & Advanced SEO | | NoahsDad0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0 -
Finding broken links / resources by topic
Hi fellow mozzers! In an effort to ensure we're exploring every avenue when launching our new website, I was hoping to find some useful broken links / resources that we could incorporate into our link building. We have used the standard tools for this (W3C, Xenu etc), but they all seem to have the same issue in that they reveal all the missing links on a site (although some don't actually tell you the page they are on), but you still have to sort them to see if the links/ resource is related to your theme. When you're on a niche site, this obviously isn't an issue, but on a site like Mashable (to use the example given in a recent SEOmoz blog) it could result in wading through hundreds of links to find one relevant one right at the end. Is there a tool that allows you to specify what theme links you are looking for from a site, or better yet one that allows you to check multiple sites for multiple missing themed links in one go? Or is the best way to export the list and just search the document for certain keywords?
Intermediate & Advanced SEO | | themegroup0 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560