Content Comparison

...

Code Block

var json = eval('(' + text + ')');
var retval = [];
// For each "result" in the array
// Extract URL, title, description, eg for the flickr blogs API 
// (http://www.flickr.com/services/api/response.json.html)
for (x in json.blogs.blog) {
    var blog = json.blogs.blog[x];
    var retobj = { url: blog.url, title: blog.name };
    retval.push(retobj);
}
// Alternatively set retobj.fullText to specify the content from the API response
// In addition set retobj.spiderOut: true, to run this script on the corresponding URL, eg:
if (null != json.nextPageUrl) 
    retval.push({url: json.nextPageUrl, spiderOut: true});
retval; // annoying feature of our javascript engine, instead of returning you just evaluate the var to return

Noteinfo
For XML APIs the basic principle is the same, but the XML object needs to be parsed using embedded Java calls (since the Rhino javascript engine currently in use does not support e4x - it is on our roadmap to upgrade to a version that does).

...

it is likely that standard web-crawling measures are needed such as custom user-agents, and per-page wait times. Because these might well be different from the search engine to the pages themselves, "Follow Web Links" has its own waitTimeBetweenPages_ms, and userAgent fields (if not specified these are inherited from the parent object).

IN PROGRESS

Legacy documentation:

...

Version	Old Version 8	New Version 9
Changes made by	andrew johnston (Unlicensed)	andrew johnston (Unlicensed)
Saved on	Apr 15, 2014	Apr 15, 2014

Versions Compared

Key