function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
InspectorInspector 

Crawling for links via API

For my app, I am trying to mimic this: "Our links are automatically crawled via API provided by hosts below." Is it possible for me to use Force to "crawl" for links based on metadata I provide? I know this is a complex question -- I don't expect a how-to. Just point me in the right direction. Thanks in advance. 

IspitaIspita

Hi Inspector,

The default behavior is for the robots.txt file to disallow search engines from crawling your page. In case you donot specify anything in Site Robots.txt file , search engines will not be able to access your site pages.

InspectorInspector

You misundestood -- I don't want my site to be crawled -- I'm interested in crawling the web based on certain metadata to pull in links relating to that metadata. 

mikefitzmikefitz

You would have to use http callouts to grab or crawl html pages from your metadata list but you just want to be aware of the callout limits if you plan on grabbing tons of data. Also if I remember correctly, you have to add all domains you are calling out to, to the remote sites settings. Hopefully your metadata list is all contained within a couple of domains.

 

 

Good luck.

InspectorInspector

Thank you for the answer; yep, just a few domains. My app is is IP related and is meant to track content distribution on a handful of video sites. Therefore, rather than go to ustream, for example, and type keywords into the search engine on their site, I am attempting to create an object wherein the keywords are entered into a field or fields on my force app and an action will trigger the search in the USTREAM engine. I am not necessarily looking at this point on actually pulling the links back into SFDC, though I'd like eventually to explore that.