Robots Disallow Backslash - Is it right command

Modi

Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character

ex - www.xyz.com/\/index.php?option=com_product

www.xyz.com/\"/index.php?option=com_product

Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk

Need to know for command :-

User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,

Modi

Thanks, you seem lucky to me.. Almost after 2 month i have got the code for making all these encoded url's redirect correctly. Finally, now if one types

http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10

then he's redirected through 301 to the correct url

http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10

Everett

Hello Gagan,

I think the best way to handle this would be using the rel canonical tag or rewriting the URLs to get rid of the parameters and replace them with something more user-friendly.

The rel canonical tag would be the easiest way out of those two. I notice the version without the encoding (e.g. http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 ) have a rel canonical tag that correctly references itself as the canonical version. However, the encoded URLs (e.g. http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10) which is actually http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 does NOT have a rel canonical tag.

If the version with the backslash had a rel canonical tag stating that the following URL is canonical it would solve your issue, I think.
Canonical URL:
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10

Modi

Sure, If i show you some url they are crawled as :-

Sample Incorrect URLs crawled and reported as duplicate one in Google Webmaster & Moz too

|

http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10

| http://www.mycarhelpline.com/\"/index.php?option=com_newcar&view=category&Itemid=2 |

|

Correct URL

http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10

http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2

What we found online

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.

%22 reflects - " and %5c as \ (forward slash)

We intend to remove these duplicate one created having %22 and %5c within them..

Many thanks

Everett

I am not entirely sure I understood your question as intended, but I will do my best to answer.

I would not put this in my robots.txt flie because it could possibly be misunderstood as a forward slash, in which case your entire domain would be blocked:

Disallow: \

We can possibly provide you with some alternative suggestions on how to keep Google from crawling those pages if you could share some real examples.

It may be best to rewrite/redirect those URls instead since they don't seem to be the canonical version you intend to be presented to the user.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots Disallow Backslash - Is it right command

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt blocked internal resources Wordpress

If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?

Meta Robot Tag:Index, Follow, Noodp, Noydir

Should I use meta noindex and robots.txt disallow?

Robots.txt: how to exclude sub-directories correctly?

Should comments and feeds be disallowed in robots.txt?

Could you use a robots.txt file to disalow a duplicate content page from being crawled?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved