Page Comparison

...

Gliffy

border	true

chrome	min
size	300500
name	Follow Web Links 2

API Parsing

...

For each document an array of the following objects is populated.

Code Block

 "url": string, // Mandatory - this URL is copied into the "URL" field of the generated document,                     // and is used to fetch the content unless "fullText" is set.
    "title": string, // Mandatory (unless "spiderOut" set, see below) - this is used to generate the document's title.
    "description": string, // Optional, if set then used to generate the document's description.
    "publishedDate": string, // Optional, if not set then the current date is used instead.
 
    "fullText": string, // Optional, if set then this is the content for the generated document, ie "url" is not followed.
 
    "spiderOut": integer //Optional, if set to true then the searchConfig.script is applied to the resulting document,
                            // for a depth of up to "searchConfig.maxDepth" times
                            // Note spiderOut only works if rss.extraUrls is non-empty (ie use that instead of url)

spiderOut can be used to apply the Follow Web Links script to the resulting document. This means that if the newly generated document also contains additional urls, the script will run again on these urls and return the array to make additional documents.

When spiderOut is enabled it will continue to follow additional urls (if present) until the parameter setting for maxDepth has been reached.

Gliffy


size	500
name	Max Depth

Example using JSON

Code Block

var json = eval('(' + text + ')');
var retval = [];
// For each "result" in the array
// Extract URL, title, description, eg for the flickr blogs API 
// (http://www.flickr.com/services/api/response.json.html)
for (x in json.blogs.blog) {
    var blog = json.blogs.blog[x];
    var retobj = { url: blog.url, title: blog.name };
    retval.push(retobj);
}
// Alternatively set retobj.fullText to specify the content from the API response
// In addition set retobj.spiderOut: true, to run this script on the corresponding URL, eg:
if (null != json.nextPageUrl) 
    retval.push({url: json.nextPageUrl, spiderOut: true});
retval; // annoying feature of our javascript engine, instead of returning you just evaluate the var to return

Note
For XML APIs the basic principle is the same, but the XML object needs to be parsed using embedded Java calls (since the Rhino javascript engine currently in use does not support e4x - it is on our roadmap to upgrade to a version that does).

HTML Parsing

IN PROGRESS

...

Versions Compared

Old Version 4

New Version 5

Key

API Parsing

Example using JSON

HTML Parsing