Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If it is used on web/feed sources, it can be called using "links."  If used on file/database sources it can be called using "links" or "split."

Follow Web Links can be used for has the following two major use cases:

...

It takes as its input documents that have been generated by an extractor, and then creates new documents based on the url links.  The original documents can then be retained or discarded.

Gliffy
chromemin
nameFollow Web Links 2

 

...

Examples

Scriptlang is When Follow Web Links is used for API-style parsing, scriptlang must be set to "javascript, to enable "Follow Web Links" to parse the additional urls.  The script field is ."  You can then specify a javascript type script for the script field, which will be passed a variable called "text," which returns the results of containing the response to the specified url (original document).For each document, an array of the following objects is populated..  The script must ouput an array in the following format:

 

Code Block
 "url": string, // Mandatory - this URL is copied into the "URL" field of the generated document,                     // and is used to fetch the content unless "fullText" is set.
    "title": string, // Mandatory (unless "spiderOut" set, see below) - this is used to generate the document's title.
    "description": string, // Optional, if set then used to generate the document's description.
    "publishedDate": string, // Optional, if not set then the current date is used instead.
 
    "fullText": string, // Optional, if set then this is the content for the generated document, ie "url" is not followed.
 
    "spiderOut": integer //Optional, if set to true then the Follow Web Links script is applied to the resulting document,
                            // for a depth of up to "maxDepth" times
                            // Note spiderOut only works if rss.extraUrls is non-empty (ie use that instead of url)

...

In the example, you can see that the links object has a complex parameter extrameta which is configured to call an xpath script that parses the input of the XML and converts it to JSON output.

 

...

When Follow Web Links is used to ingest HTML pages, some additional considerations are required for HTML.

...