...
Gliffy | ||||||
---|---|---|---|---|---|---|
|
The following table describes the parameters of the Follow Web Links configuration.
Parameter | Description |
---|---|
userAgent | (Optional) Overrides the "parent" (rss) setting for "search" operations (see usage guide) |
proxyOverride | (Optional) "direct" to bypass proxy (the default), or a proxy specification "(http|socks)://host:port" |
script | (Mandatory) Script, must "return" (last statement evaluated) an array of the following format: // [ { "url": string, "title": string /* optional-ish */, // "description": string /* optional */, publishedDate: string /* optional */, // "spiderOut": string /*optional */ } |
scriptlang | (Mandatory) Only "javascript" is supported, use extraMeta for different script types |
scriptflags | (Optional) The flags to apply to the above script, see "unstructuredAnalysis.meta" for more details |
extraMeta | (Optional) A pipeline of metadata extraction operations that are applied prior to "script", see "Using The Feed Harvester" overview |
pageChangeRegex | (Optional) If non-null, this regex should be used to match the pagination URL parameter (which will be replaced by pageChangeReplace) // Also, group 1 should be the start, to allow any offsets specified in the URL to be respected |
pageChangeReplace | (Optional) Mandatory if pageChangeRegex is non-null, must be a replace string where $1 is the page*numResultsPerPage |
numPages | (Optional) Mandatory if pageChangeRegex is non-null - controls the number of pages deep the search will go |
stopPaginatingOnDuplicate | (Ignored unless pageChangeRegex is non-null) If true (default: false) then will stop harvesting as soon as an already harvested link is encountered // (for APIs that return docs in time order, ensures that no time is wasted harvesting and then discarding duplicate links) |
numResultsPerPage | (Optional) Mandatory if pageChangeRegex is non-null - controls the number of results per page |
waitTimeBetweenPages_ms | (Optional) Only used if pageChangeRegex is non-null - controls a wait between successive pages if set |
maxDepth | (Optional, defaults to 2) If spidering out (returning "spiderOut": "true" from the script) the maximum depth to go |
Examples
API-Style Parsing
Scriptlang
is set to javascript, to enable "Follow Web Links" to parse the additional urls. The script
field is passed a variable called "text" which returns the results of the specified url (original document).
...