Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
-
Hi!
I have pages within my forum where visitors can upload photos. When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed. The industry however is one that really leans on images and having the images in Google Image search is important to us.
The url structure is like such: domain.com/community/photos/~username~/picture111111.aspx
I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results. This would be something like this:
User-agent: googlebot
Disallow: /community/photos/
Can I disallow Googlebot specifically rather than just using User-agent: * which would then allow googlebot-image to pick up the photos? I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible?
Thanks!
Leona
-
Are you seeing the images getting indexed, though? Even if GWT recognize the Robots.txt directives, blocking the pages may essentially keep the images from having any ranking value. Like Matt, I'm not sure this will work in practice.
Another option would be to create an alternate path to just the images, like an HTML sitemap with just links to those images and decent anchor text. The ranking power still wouldn't be great (you'd have a lot of links on this page, most likely), but it would at least kick the crawlers a bit.
-
Thanks Matt for your time and assistance! Leona
-
Hi Leona - what you have done is something along the lines of what I thought would work for you - sorry if I wasn't clear in my original response - I thought you meant if you created a robots.txt and specified Googlebot to be disallowed then Googlebot-image would pick up the photos still and as I said this wouldn't be the case as it Googlebot-image will follow what it set out for Googlebot unless you specify otherwise using the allow directive as I mentioned. Glad it has worked for you - keep us posted on your results.
-
Hi Matt,
Thanks for your feedback!
It is not my belief that Googlebot overwrides googlebot-images otherwise specifying something for a specific bot of Google's wouldn't work, correct?
I setup the following:
User-agent: googlebot
Disallow: /community/photos/
User-agent: googlebot-Image
Allow: /community/photos/
I tested the results in Google Webmaster Tools which indicated:
Googlebot: Blocked by line 26: Disallow: /community/photos/Detected as a directory; specific files may have different restrictions
Googlebot-Image: Allowed by line 29: Allow: /community/photos/Detected as a directory; specific files may have different restrictions
Thanks for your help!
Leona
-
Hi Leona
Googlebot-image and any of the other bots that Google uses follow the rules set out for Googlebot so blocking Googlebot would block your images as it overrides Googlebot-image. I don't think that there is a way around this using the disallow directive as you are blocking the directory which contains your images so they won't be indexed using specific images.
Something you may want to consider is the Allow directive -
Disallow: /community/photos/
Allow: /community/photos/~username~/
that is if Google is already indexing images under the username path?
The allow directive will only be successful if it contains more or equal number of characters as the disallow path, so bare in mind that if you had the following;
Disallow: /community/photos/
Allow: /community/photos/
the allow will win out and nothing will be blocked. please note that i haven't actioned the allow directive myself but looked into it in depth when i studied the robots.txt for my own sites it would be good if someone else had an experience of this directive. Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
For FAQ Schema markup, do we need to include every FAQ that is on the page in the markup, or can we use only selected FAQs?
The website FAQ page we are working on has more than 50 FAQs. FAQ Schema guidelines say the markup must be an exact match with the content. Does that mean all 50+ FAQs must be in the mark-up? Or does that mean the few FAQs we decided to put in the markup are an exact match?
Intermediate & Advanced SEO | | PKI_Niles0 -
Is it ok to repeat a (focus) keyword used on a previous page, on a new page?
I am cataloguing the pages on our website in terms of which focus keyword has been used with the page. I've noticed that some pages repeated the same keyword / term. I've heard that it's not really good practice, as it's like telling google conflicting information, as the pages with the same keywords will be competing against each other. Is this correct information? If so, is the alternative to use various long-winded keywords instead? If not, meaning it's ok to repeat the keyword on different pages, is there a maximum recommended number of times that we want to repeat the word? Still new-ish to SEO, so any help is much appreciated! V.
Intermediate & Advanced SEO | | Vitzz1 -
Why some websites can rank the keywords they don't have in the page?
Hello guys, Yesterday, I used SEMrush to search for the keyword "branding agency" to see the SERP. The Liquidagency ranks 5th on the first page. So I went to their homepage but saw no exact keywords "branding agency", even in the page source. Also, I didn't see "branding agency" as a top anchor text in the external links to the page (from the report of SEMrush). I am an SEO newbie, can someone explain this to me, please? Thank you.
Intermediate & Advanced SEO | | Raymondlee0 -
How can I prevent duplicate pages being indexed because of load balancer (hosting)?
The site that I am optimising has a problem with duplicate pages being indexed as a result of the load balancer (which is required and set up by the hosting company). The load balancer passes the site through to 2 different URLs: www.domain.com www2.domain.com Some how, Google have indexed 2 of the same URLs (which I was obviously hoping they wouldn't) - the first on www and the second on www2. The hosting is a mirror image of each other (www and www2), meaning I can't upload a robots.txt to the root of www2.domain.com disallowing all. Also, I can't add a canonical script into the website header of www2.domain.com pointing the individual URLs through to www.domain.com etc. Any suggestions as to how I can resolve this issue would be greatly appreciated!
Intermediate & Advanced SEO | | iam-sold0 -
Dynamic pages - ecommerce product pages
Hi guys, Before I dive into my question, let me give you some background.. I manage an ecommerce site and we're got thousands of product pages. The pages contain dynamic blocks and information in these blocks are fed by another system. So in a nutshell, our product team enters the data in a software and boom, the information is generated in these page blocks. But that's not all, these pages then redirect to a duplicate version with a custom URL. This is cached and this is what the end user sees. This was done to speed up load, rather than the system generate a dynamic page on the fly, the cache page is loaded and the user sees it super fast. Another benefit happened as well, after going live with the cached pages, they started getting indexed and ranking in Google. The problem is that, the redirect to the duplicate cached page isn't a permanent one, it's a meta refresh, a 302 that happens in a second. So yeah, I've got 302s kicking about. The development team can set up 301 but then there won't be any caching, pages will just load dynamically. Google records pages that are cached but does it cache a dynamic page though? Without a cached page, I'm wondering if I would drop in traffic. The view source might just show a list of dynamic blocks, no content! How would you tackle this? I've already setup canonical tags on the cached pages but removing cache.. Thanks
Intermediate & Advanced SEO | | Bio-RadAbs0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0 -
Is it allowed to have different alt on same image on different pages?
Hi, I have images that match several different keywords and I wondered if I can give them different alts based on the page that they are displayed or will Google be angry with me? Thanks
Intermediate & Advanced SEO | | BeytzNet0