Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For a description of supported engines, see Automated text extraction.

The following table describes the parameters of the feature extraction configuration.

FieldDescription
criteria

A javascript expression that is passed the document as _doc - if returns false then this pipeline element is bypassed

enginename

The name of the text engine to use (can be fully qualified (eg "com.ikanow.infinit.e.harvest.boilerpipe"), or just the name (eg "boilerpipe") if the engine is registered in the Infinit.e system configuration)

engineConfig

The configuration object to be passed to the engine

entityFilter

(regex applied to entity indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only)

assocFilter

(regex applied to new-line separated association indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only)

Examples

Specifying the Feature Engine

...

Code Block
{
    "description": "Article on Medical Issues",
    "harvestBadSource": false,
    "isApproved": true,
    "isPublic": true,
    "key": "http.www.mayoclinic.com.rss.blog.xml",
    "mediaType": "News",
    "modified": "Oct 19, 2010 11:31:59 AM",
    "tags": [
        "topic:healthcare",
        "industry:healthcare",
        "mayo clinic",
        "health"
    ],
    "title": "MayoClinic: General Topics",
    "processingPipeline": [
        {
            "feed": {
                "extraUrls": [
                    {
                        "url": "http://www.mayoclinic.com/rss/blog.xml"
                    }
                ]
            }
        },
        {
            "textEngine": {
                "engineName": "AlchemyAPI"
            }
        },
        {
            "featureEngine": {
                "engineName": "OpenCalais"
            }
        }
    ]
}

...

enginConfig Example

You can use the engineConfig object to pass configuration parameters along to the feature engine.

...