Using the Google Remove URL Tool to remove https pages

sparrowdog

I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week.

I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front.

For example, I add to the removal tool:-

https://www.mydomain.com/blah.html?search_garbage_url_addition

On the confirmation page, the URL actually shows as:-

http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition

I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look?

AND PART 2 OF MY QUESTION

If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request?

www.domain.com/url.html?xsearch_...

A description for this result is not available because of this site's robots.txt – learn more.

sparrowdog

Thanks so much for taking the time to respond.

I think I will add the https to WMT and remove them that way.

I will take a look through the .htaccess file and the creation of the ssl robots file. A while back, it seemed that Google was indexing a lot of my site as https and then the dropped it and went mainly back to http. I will get that sorted to make it clear.

TomRayner

Hi there

I'll start with question 2 first as it's a bit easier to answer. Robots.txt blocks the crawling of a page, but not necessarily indexing. Of course, if the page cannot be crawled it will be deindexed eventually anyway, but if you're getting that description for one of your URLs, Google has not been able to access it and will stop trying to. So that is usually enough, although if you want to remove it as well, you can by all means.

For question 1 - GWT is a bit awkward in the sense that it treats http and https versions of your site as different webmaster properties. Furthermore, if you want to remove a URL on your site, it will always prefix it with the http/https version of your site, no matter how you enter it.

If you added another WMT property that was https://www.yourdomain.com - you would be able to manage that domain as well and thus you would be able to remove any URLs under that prefix.

Incidentally, if you want to block all HTTPS pages from being accessed, you can do that with a special instruction in your htaccess file and robots txt. You can instruct the Googlebot and other bots to read a specific robots.txt file if they visit an HTTPS URL. To do that, you would first add this to your htaccess file:

RewriteCond %{HTTPS} ^on$
RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule ^(.*)$ /robots_ssl.txt [L]

This command basically says "if the URL has https, read the robots_ssl.txt file". You then upload a file called robots_ssl.txt to your root domain. In the txt file you just add:

User-agent: *
Disallow: /

So now, if a bot reaches an https URL, it has to read the robots_ssl.txt file and upon reading that, they are denied access. That would prevent all of your https URLs from being indexed.

That might be useful to you, but if you go ahead and use it please take care to backup all your files in case anything goes wrong - your htaccess file is very important!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Using the Google Remove URL Tool to remove https pages

www.domain.com/url.html?xsearch_...

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

How google crawls images and which url shows as source?

Does Google index internal anchors as separate pages?

Blog Page Titles - Page 1, Page 2 etc.

Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?

Google Cache showing a different URL

Noindex vs. page removal - Panda recovery

Removing URL Parentheses in HTACCESS

Google News URL Format

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved