...
Field | Description | ||
---|---|---|---|
feedType | Currently not used - will allow for RSS vs Atom in future releases (currently only RSS is supported) | ||
waitTimeOverride_ms | Optional - if specified, controls the amount of time between successive reads to a site (default: 10000ms): // ie if a site is timing out it may limit the number of accesses from a given IP - set the number higher // for large sites you can increase the performance of the harvester by setting this number lower. | ||
updateCycle_secs | Optional - if present harvested URLs may be replaced if they are older than this time and are encountered from the RSS or in the "extraUrls" | ||
regexInclude | Optional - if specified, only URLs matching the regex will be harvested | ||
regexExclude | Optional - if specified, any URLs matching the regex will not be harvested | ||
extraUrls | Complex Type "url": string, // The URL "title": string, // The title that the document will be given (ie the equivalent to the RSS title). See below.
"description": string, // (Optional) The description that the document will be given (ie the equivalent to the RSS description) "publishedData": string, // (Optional) The date that will be assigned to the document (default: now) - this can be overridden from "structuredAnalysis" "fullText": string // (Optional) If present and "useTextExtractor" is "none", then uses the specified string instead of the URL contents (mainly for debugging) "fullText: string , // (Optional) Can be used to pre-populate content - mostly useful for debugging | ||
userAgent | (Optional) If present overrides the system default user agent string | ||
proxyOverride | (Optional) "direct" to bypass proxy (the default), or a proxy specification "(http|socks)://host:port" | ||
httpFields | (Optional) Additional HTTP fields to be applied to the request headers Can contain the special field "Content", which will POST the associated value. |
...