+ Start a Discussion
dev401hasdev401has 

Robots.txt is not available in Domain

Hi

 

I have prepared a site http://nbcuni.force.com/commops

My Force.com domain name is nbcuni.force.com

 

I have prepared a robots.txt file (VF page which allows all bots) and uploaded it in Site Robots.txt.

That robots.txt is visible at http://nbcuni.force.com/commops/robots.txt infact it should be visible at http://nbcuni.force.com/robots.txt  (in domain)

 

In Google Webmaster tools the bot crawls at http://nbcuni.force.com/robots.txt and are unable to find the file hence give crawl error (503)

 

How to get Robots.txt at the http://nbcuni.force.com/robots.txt

Best Answer chosen by Admin (Salesforce Developers) 
paul-lmipaul-lmi

Create another site, but leave the Default Web Address field empty, and add the robots file there.  same goes for if you want to properly support favicon.ico .  

All Answers

paul-lmipaul-lmi

Create another site, but leave the Default Web Address field empty, and add the robots file there.  same goes for if you want to properly support favicon.ico .  

This was selected as the best answer
dev401hasdev401has

Hi Paul

 

Thanks for the reply.

I thought it should come directly instead of creating a new site.

I am doing it in sandbox so it will work. I will create a new site and add robots file there but just curious to know what in developer edition? We dont have access to create more than one site in developer edition.

 

-Has

NBC Commercial Guidelines

paul-lmipaul-lmi

as far as dev orgs go, if you didn't create the site similarly to the new one you created in sandbox (no path), you won't be able to properly support robots.txt in it.

emilyjonesemilyjones

But, you can't create a site that has the same default web address, even if you remove the custom web address.

 

Did anyone solve this problem yet?

 

dev401hasdev401has

I didnt create another site. I kept it as it is. my robots.txt comes after the site instead of main domain but it works fine. Google crawler started crawling the site.

Just make sure that original robots.txt of salesforce which blocks all crawlers is removed and you add your own robots.txt in your site which allows bots.

 

rgds

dev401has

NBCU Commercial Guidelines

emilyjonesemilyjones

How do you remove their robots.txt file?

 

 

dev401hasdev401has

I just added my robots vf page in the field in sites. it got removed automatically.

 

rgds

dev401has

NBCU Commercial Guidelines

emilyjonesemilyjones

Do you know if it happened right away or if it took a while?

 

BulentBulent

search engines look for the the robots.txt at the root level.

so if you are not masking your force.com site url with your custom url than you need to setup a site with no path to serve your robot.txt.

 

Also it'll take up to 24h for cache to clear and reflect your robot.txt and favico.ico

these files are cached for 24h.

emilyjonesemilyjones

When you say "masking your force.com site url with your custom url", you mean just adding the custom url to the site settings? Or are there some other DNS settings I need to change? 

BulentBulent

let's say your site url is mysite.force.com/partners/

and you wnt to mask this url with your custom domain or subdomain like partners.mydomain.com

first you need to create a cname  (at your domain name provider portal) to point partners.mydomain.com to mysite.force.com (not all the cnmaes will point to the same force.com subdomain not the actual site url).

 

this might take up to 24h to propagate globally, once the cname is in place, then you need to navigate to the specific site's detail page and update the custom webaddres value of your site with your custom domain (partners.mydomain.com)

 

now you can access your site via either http://partners.mydomain.com or http://mysite.force.com/partners/ or via the secure url https://mysite.secure.force.com/partners/

emilyjonesemilyjones

Thank you for your help. I have created a CNAME (about 2 weeks ago, so I know it has propagated) and am still having issues with the following: 

 

http://boards.developerforce.com/t5/Force-com-Sites/robots-txt-vs-robots/td-p/208567