Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This toolkit element passes the document full text to an external (or embedded) extraction engine to return entities and associations (and optionally metadata).

...

.

Table of Contents

Format

Code Block
{
	"display": string,
	"featureEngine": {
		"criteria":string,// A javascript expression that is passed the document as _doc - if returns false then this pipeline element is bypassed
		"enginename":string,// The name of the text engine to use (can be fully qualified (eg "com.ikanow.infinit.e.harvest.boilerpipe"), or just the name (eg "boilerpipe") if the engine is registered in the Infinit.e system configuration)
		"engineConfig"{"config_param_name",string,...}// The configuration object to be passed to the engine
		"entityFilter":string,// (regex applied to entity indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only)
		"assocFilter":string,// (regex applied to new-line separated association indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only) 
}
}

...

Feature extraction uses text obtained from the text extraction stage to generate entities, associations, and potentially metadata.  Text extraction is a separate stage in the pipeline with different extraction engines.

Warning

Most feature extractors require for text to have been extracted with a "textEngine" or "text" object before it in the pipeline, unless the data comes from file extractor (which automatically fills in a document's "fullText" field).

For a list of supported text extractors, see Automated text extraction.

For example Alchemy API can perform both text extraction using the Alchempy API, and feature extraction using the Alchemy metadata API.

...