Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
var json = eval('(' + text + ')');
var retval = [];
// For each "result" in the array
// Extract URL, title, description, eg for the flickr blogs API 
// (http://www.flickr.com/services/api/response.json.html)
for (x in json.blogs.blog) {
    var blog = json.blogs.blog[x];
    var retobj = { url: blog.url, title: blog.name };
    retval.push(retobj);
}
// Alternatively set retobj.fullText to specify the content from the API response
// In addition set retobj.spiderOut: true, to run this script on the corresponding URL, eg:
if (null != json.nextPageUrl) 
    retval.push({url: json.nextPageUrl, spiderOut: true});
retval; // annoying feature of our javascript engine, instead of returning you just evaluate the var to return

Noteinfo

For XML APIs the basic principle is the same, but the XML object needs to be parsed using embedded Java calls (since the Rhino javascript engine currently in use does not support e4x - it is on our roadmap to upgrade to a version that does).

...

it is likely that standard web-crawling measures are needed such as custom user-agents, and per-page wait times. Because these might well be different from the search engine to the pages themselves, "Follow Web Links" has its own waitTimeBetweenPages_ms, and userAgent fields (if not specified these are inherited from the parent object).

 

 

IN PROGRESS

 

Legacy documentation:

...