Description

The Feed Harvester will connect to and extract data from an RSS feed.

It uses feedType to specify that the data source is RSS. It connects to the specified urls and can either include or exclude specfied urls via regex using regexInclude or regexExclude.

extraUrls

complex type that enables urls to be manually specified, overriding settings that would be provided by the RSS feed.

Example:

Code Block

"extraUrls": [ // This array allows for manually specified URLs to be harvested once        {
            "url": string, // The URL 
            "title": string, // The title that the document will be given (ie the equivalent to the RSS title)
            "description": string, // (Optional) The description that the document will be given (ie the equivalent to the RSS description)
            "publishedData": string, // (Optional) The date that will be assigned to the document (default: now) - this can be overridden from "structuredAnalysis"
            "fullText": string //
 (Optional) If present and "useTextExtractor" is "none", then uses the 
specified string instead of the URL contents (mainly for debugging)

Legacy documentation:

Using the Feed Harvester

...

Version	Old Version 4	New Version 5
Changes made by	AlexI	andrew johnston
Saved on	Oct 11, 2013	Apr 12, 2014

Versions Compared

Key

Description

extraUrls

Content Comparison

Versions Compared

Key

Description

extraUrls