Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Stop google indexing CDN pages
-
Just when I thought I'd seen it all, google hits me with another nasty surprise!
I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?
Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.
Anyone got an idea on how to stop that?
Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.
It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?
Have you seen this problem and beat it?
(Of course the next thing is Roger might look at google results and start crawling them too, LOL)
P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
-
Thank you Edward.
I don't have quite that problem, but I think you are right too.
My CDN is set up to be Origin Pull.
That means there is no need to FTP - the system just fetches content as requested.
- you should check that out if you have to ftp everything.
But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.
I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.
I do have a few other types of pages, so I will handle the code for them next.
I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.
-
It sounds like you have set up your CDN slightly wrong.
After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets
I imagine you have your origin directory for the CDN in the public html folder.
Create a subdomain, set that as the origin.
Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/
I have a subdomain called assets: http://assets.looksfishy.co.uk/
The cdn content: http://cdn.looksfishy.co.uk/
Files uploaded here:
http://assets.looksfishy.co.uk/species/holder/pike.jpg
Displayed here:
http://cdn.looksfishy.co.uk/species/holder/pike.jpg
Check the ip address on them.
It does make uploading images by ftp a bit of a faff, but does make your site better
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site appearing and disappearing from google serps.
Hi, My website is normally on page 2-3 on google consistently. Over the past month it has been appearing and then completely disappearing from the serps. One day it will be on page 2, then the next day completely missing from the serps. When i check the index it seems to be indexed correctly when doing site:mysite.com. I don't understand why this keeps happening, any experience with this issue? It doesn't seem to be a google dance as far as I can tell. When my other sites dance they typically just go up or down a few ranks for a couple weeks until they stabilize. Not completely fall off the search engine.
Algorithm Updates | | Chris_www0 -
US domain pages showing up in Google UK SERP
Hi, Our website which was predominantly for UK market was setup with a .com extension and only two years ago other domains were added - US (.us) , IE (.ie), EU (.eu) & AU (.com.au) Last year in July, we noticed that few .us domain urls were showing up in UK SERPs and we realized the sitemap for .us site was incorrectly referring to UK (.com) so we corrected that and the .us domain urls stopped appearing in the SERP. Not sure if this actually fixed the issue or was such coincidental. However in last couple of weeks more than 3 .us domain urls are showing for each brand search made on Google UK and sometimes it replaces the .com results all together. I have double checked the PA for US pages, they are far below the UK ones. Has anyone noticed similar behaviour &/or could anyone please help me troubleshoot this issue? Thanks in advance, R
Algorithm Updates | | RaksG0 -
Ecommerce SEO: Is it bad to link to product/category pages directly from content pages?
Hi ! In Moz' Whiteboard friday video Headline Writing and Title Tag SEO in a Clickbait World, Rand is talking about (among other things) best practices related to linking between search, clickbait and conversion pages. For a client of ours, a cosmetics and make-up retailer, we are planning to build content pages around related keywords, for example video, pictures and text about make-up and fashion in order to best target and capture search traffic related to make-up that is prevalent earlier in the costumer journey. Among other things, we plan to use these content pages to link directly to some of the products. For example a content piece about how to achieve full lashes will to link to particular mascaras and/or the mascara category) Things is, in the Whiteboard video Rand Says:
Algorithm Updates | | Inevo
_"..So your click-bait piece, a lot of times with click-bait pieces they're going to perform worse if you go over and try and link directly to your conversion page, because it looks like you're trying to sell people something. That's not what plays on Facebook, on Twitter, on social media in general. What plays is, "Hey, this is just entertainment, and I can just visit this piece and it's fun and funny and interesting." _ Does this mean linking directly to products pages (or category pages) from content pages is bad? Will Google think that, since we are also trying to sell something with the same piece of content, we do not deserve to rank that well on the content, and won't be considered that relevant for a search query where people are looking for make-up tips and make-up guides? Also.. is there any difference between linking from content to categories vs. products? ..I mean, a category page is not a conversion page the same way a products page is. Looking forward to your answers 🙂0 -
Deindexed from Google images Sep17th
We have a travel website that has been ranked in Google for 12-14years. The site produces original images with branding on them and have been for years ranking well. There's been no site changes. We have a Moz spamscore 1/17 and Domain Authority 59. Sep 17th all our images just disappeared from Google Image Search. Even searching for our domain with keyword photo results in nothing. I've checked our Search console and no email from Google and I see no postings on Moz and others relating to search algo changes with Images. I'm at a loss here.. does anyone have some advice?
Algorithm Updates | | danta2 -
Homepage Index vs Home vs Default?
Should your home page be www.yoursite.com/index.htm or home.htm or default.htm on an apache server? Someone asked me this, and I have no idea. On our wordpress site, I have never even seen this come up, but according to my friend, every homepage HAS to be one of those three. So my question is which one is best for an apache server site AND does it actually have to be one of those three? Thanks, Ruben
Algorithm Updates | | KempRugeLawGroup0 -
Does a KML file have to be indexed by Google?
I'm currently using the Yoast Local SEO plugin for WordPress to generate my KML file which is linked to from the GeoSitemap. Check it out http://www.holycitycatering.com/sitemap_index.xml. A competitor of mine just told me that this isn't correct and that the link to the KML should be a downloadable file that's indexed in Google. This is the opposite of what Yoast is saying... "He's wrong. 🙂 And the KML isn't a file, it's being rendered. You wouldn't want it to be indexed anyway, you just want Google to find the information in there. What is the best way to create a KML? Should it be indexed?
Algorithm Updates | | projectassistant1 -
Google is forcing a 301 by truncating our URLs
Just recently we noticed that google has indexed truncated urls for many of our pages that get 301'd to the correct page. For example, we have:
Algorithm Updates | | mmac
http://www.eventective.com/USA/Massachusetts/Bedford/107/Doubletree-Hotel-Boston-Bedford-Glen.html as the url linked everywhere and that's the only version of that page that we use. Google somehow figured out that it would still go to the right place via 301 if they removed the html filename from the end, so they indexed just: http://www.eventective.com/USA/Massachusetts/Bedford/107/ The 301 is not new. It used to 404, but (probably 5 years ago) we saw a few links come in with the html file missing on similar urls so we decided to 301 them instead thinking it would be helpful. We've preferred the longer version because it has the name in it and users that pay attention to the url can feel more confident they are going to the right place. We've always used the full (longer) url and google used to index them all that way, but just recently we noticed about 1/2 of our urls have been converted to the shorter version in the SERPs. These shortened urls take the user to the right page via 301, so it isn't a case of the user landing in the wrong place, but over 100,000 301s may not be so good. You can look at: site:www.eventective.com/usa/massachusetts/bedford/ and you'll noticed all of the urls to businesses at the top of the listings go to the truncated version, but toward the bottom they have the full url. Can you explain to me why google would index a page that is 301'd to the right page and has been for years? I have a lot of thoughts on why they would do this and even more ideas on how we could build our urls better, but I'd really like to hear from some people that aren't quite as close to it as I am. One small detail that shouldn't affect this, but I'll mention it anyway, is that we have a mobile site with the same url pattern. http://m.eventective.com/USA/Massachusetts/Bedford/107/Doubletree-Hotel-Boston-Bedford-Glen.html We did not have the proper 301 in place on the m. site until the end of last week. I'm pretty sure it will be asked, so I'll also mention we have the rel=alternate/canonical set up between the www and m sites. I'm also interested in any thoughts on how this may affect rankings since we seem to have been hit by something toward the end of last week. Don't hesitate to mention anything else you see that may have triggered whatever may have hit us. Thank you,
Michael0 -
Does google index non-public pages ie. members logged in page
hi, I was trying to locate resources on the topics regarding how much the google bot indexes in order to qualify a 'good' site on their engine. For example, our site has many pages that are associated with logged in users and not available to the public until they acquire a login username and password. Although those pages show up in google analytics, they should not be made public in the google index which is what happens. In light of Google trying to qualify a site according to how 'engaged' a user is on the site, I would feel that the activities on those member pages are very important. Can anyone offer suggestions on how Google treats those pages since we are planning to do further SEO optimization of those pages. Thanks
Algorithm Updates | | jumpdates0