...
Code Block |
---|
var json = eval('(' + text + ')'); var retval = []; // For each "result" in the array // Extract URL, title, description, eg for the flickr blogs API // (http://www.flickr.com/services/api/response.json.html) for (x in json.blogs.blog) { var blog = json.blogs.blog[x]; var retobj = { url: blog.url, title: blog.name }; retval.push(retobj); } // Alternatively set retobj.fullText to specify the content from the API response // In addition set retobj.spiderOut: true, to run this script on the corresponding URL, eg: if (null != json.nextPageUrl) retval.push({url: json.nextPageUrl, spiderOut: true}); retval; // annoying feature of our javascript engine, instead of returning you just evaluate the var to return |
Noteinfo |
---|
For XML APIs the basic principle is the same, but the XML object needs to be parsed using embedded Java calls (since the Rhino javascript engine currently in use does not support e4x - it is on our roadmap to upgrade to a version that does). |
...
it is likely that standard web-crawling measures are needed such as custom user-agents, and per-page wait times. Because these might well be different from the search engine to the pages themselves, "Follow Web Links" has its own waitTimeBetweenPages_ms
, and userAgent
fields (if not specified these are inherited from the parent object).
IN PROGRESS
Legacy documentation:
...