...
Code Block | ||
---|---|---|
| ||
{ "display": string, "web": { "feedType": string, // Currently not used - will allow for RSS vs Atom in future releases (currently only RSS is supported) "waitTimeOverride_ms": integer, // Optional - if specified, controls the amount of time between successive reads to a site (default: 10000ms): // ie if a site is timing out it may limit the number of accesses from a given IP - set the number higher // for large sites you can increase the performance of the harvester by setting this number lower "updateCycle_secs": integer, // Optional - if present harvested URLs may be replaced if they are older than this time and are encountered from the RSS or in the "extraUrls" "regexInclude": string, // Optional - if specified, only URLs matching the regex will be harvested "regexExclude": string, // Optional - if specified, any URLs matching the regex will not be harvested "extraUrls": [ // This array allows for manually specified URLs to be harvested once { "url": string, // The URL "title": string, // The title that the document will be given (ie the equivalent to the RSS title) "description": string, // (Optional) The description that the document will be given (ie the equivalent to the RSS description) "publishedData": string, // (Optional) The date that will be assigned to the document (default: now) - this can be overridden from "structuredAnalysis" "fullText": string // (Optional) If present and "useTextExtractor" is "none", then uses the specified string instead of the URL contents (mainly for debugging) } ], "userAgent": string, // (Optional) If present overrides the system default user agent string "proxyOverride": string, // (Optional) "direct" to bypass proxy (the default), or a proxy specification "(http|socks)://host:port" "httpFields": // (Optional) Additional HTTP fields to be applied to the request headers { "field": "value" // eg "cookie": "sessionkey=346547657687" } } } |
Legacy documentation:
...
Description
In the following example, the web extractor is used to run extraUrls
parameter against the web content.
...
Code Block |
---|
{ "description": "wiy", "isPublic": true, "mediaType": "News", "tags": [ "tag1" ], "title": "aaa xml test", "processingPipeline": [ { "feed": { "extraUrls": [ { "url": "http://www.w3schools.com/xml/simple.xml" } ], "updateCycle_secs": 86400 } }, |
Panel |
---|
Legacy Documentation:
Legacy documentation:
Feed object
Legacy documentation:
Using the Feed Harvester