Crawling for links via API

For my app, I am trying to mimic this: "Our links are automatically crawled via API provided by hosts below." Is it possible for me to use Force to "crawl" for links based on metadata I provide? I know this is a complex question -- I don't expect a how-to. Just point me in the right direction. Thanks in advance.

November 26, 2011
·
Answer
·
Like
0
·
Follow
0

Ispita
Hi Inspector,

The default behavior is for the robots.txt file to disallow search engines from crawling your page. In case you donot specify anything in Site Robots.txt file , search engines will not be able to access your site pages.

November 27, 2011
·
Like
0
·
Dislike
0

Inspector
You misundestood -- I don't want my site to be crawled -- I'm interested in crawling the web based on certain metadata to pull in links relating to that metadata.

November 27, 2011
·
Like
0
·
Dislike
0

mikefitz
You would have to use http callouts to grab or crawl html pages from your metadata list but you just want to be aware of the callout limits if you plan on grabbing tons of data. Also if I remember correctly, you have to add all domains you are calling out to, to the remote sites settings. Hopefully your metadata list is all contained within a couple of domains.

Good luck.

November 28, 2011
·
Like
0
·
Dislike
0

Inspector
Thank you for the answer; yep, just a few domains. My app is is IP related and is meant to track content distribution on a handful of video sites. Therefore, rather than go to ustream, for example, and type keywords into the search engine on their site, I am attempting to create an object wherein the keywords are entered into a field or fields on my force app and an action will trigger the search in the USTREAM engine. I am not necessarily looking at this point on actually pulling the links back into SFDC, though I'd like eventually to explore that.

November 29, 2011
·
Like
0
·
Dislike
0

You need to sign in to do that.

Need an account? Sign Up

Have an account? Sign In

Dismiss

Browse by Topic

Welcome to Support!

Show

sorted by

Crawling for links via API

Hi Inspector,

You need to sign in to do that.