Gliffy


chrome	min
name	Follow Web Links 2

The following table describes the parameters of the Follow Web Links configuration.

Parameter	Description
userAgent	(Optional) Overrides the "parent" (rss) setting for "search" operations (see usage guide)
proxyOverride	(Optional) "direct" to bypass proxy (the default), or a proxy specification "(http\|socks)://host:port"
script	(Mandatory) Script, must "return" (last statement evaluated) an array of the following format: // [ { "url": string, "title": string /* optional-ish /, // "description": string / optional /, publishedDate: string / optional /, // "spiderOut": string /optional */ }
scriptlang	(Mandatory) Only "javascript" is supported, use extraMeta for different script types
scriptflags	(Optional) The flags to apply to the above script, see "unstructuredAnalysis.meta" for more details
extraMeta	(Optional) A pipeline of metadata extraction operations that are applied prior to "script", see "Using The Feed Harvester" overview
pageChangeRegex	(Optional) If non-null, this regex should be used to match the pagination URL parameter (which will be replaced by pageChangeReplace) // Also, group 1 should be the start, to allow any offsets specified in the URL to be respected
pageChangeReplace	(Optional) Mandatory if pageChangeRegex is non-null, must be a replace string where $1 is the page*numResultsPerPage
numPages	(Optional) Mandatory if pageChangeRegex is non-null - controls the number of pages deep the search will go
stopPaginatingOnDuplicate	(Ignored unless pageChangeRegex is non-null) If true (default: false) then will stop harvesting as soon as an already harvested link is encountered // (for APIs that return docs in time order, ensures that no time is wasted harvesting and then discarding duplicate links)
numResultsPerPage	(Optional) Mandatory if pageChangeRegex is non-null - controls the number of results per page
waitTimeBetweenPages_ms	(Optional) Only used if pageChangeRegex is non-null - controls a wait between successive pages if set
maxDepth	(Optional, defaults to 2) If spidering out (returning "spiderOut": "true" from the script) the maximum depth to go

Examples

API-Style Parsing

Scriptlang is set to javascript, to enable "Follow Web Links" to parse the additional urls. The script field is passed a variable called "text" which returns the results of the specified url (original document).

...

Versions Compared

Old Version 14

New Version 15

Key

Examples

API-Style Parsing

Page Comparison

Versions Compared

Old Version 14

New Version 15

Key

Examples

API-Style Parsing