...
For a description of supported engines, see Automated text extraction.
The following table describes the parameters of the feature extraction configuration.
Field | Description |
---|---|
criteria | A javascript expression that is passed the document as _doc - if returns false then this pipeline element is bypassed |
enginename | The name of the text engine to use (can be fully qualified (eg "com.ikanow.infinit.e.harvest.boilerpipe"), or just the name (eg "boilerpipe") if the engine is registered in the Infinit.e system configuration) |
engineConfig | The configuration object to be passed to the engine |
entityFilter | (regex applied to entity indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only) |
assocFilter | (regex applied to new-line separated association indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only) |
Examples
Specifying the Feature Engine
...
Code Block |
---|
{ "description": "Article on Medical Issues", "harvestBadSource": false, "isApproved": true, "isPublic": true, "key": "http.www.mayoclinic.com.rss.blog.xml", "mediaType": "News", "modified": "Oct 19, 2010 11:31:59 AM", "tags": [ "topic:healthcare", "industry:healthcare", "mayo clinic", "health" ], "title": "MayoClinic: General Topics", "processingPipeline": [ { "feed": { "extraUrls": [ { "url": "http://www.mayoclinic.com/rss/blog.xml" } ] } }, { "textEngine": { "engineName": "AlchemyAPI" } }, { "featureEngine": { "engineName": "OpenCalais" } } ] } |
...
enginConfig Example
You can use the engineConfig
object to pass configuration parameters along to the feature engine.
...