Overview

This toolkit element passes the document text to an external (or embedded) extraction engine to return entities and associations (and optionally metadata).

TODO

Format

TODO convert to JSON

Code Block

{
	"display": string,
	"featureEngine": {} // see AutomatedEntityExtractionSpecPojo below
}
//////////////////////////////////
 
	public static class AutomatedEntityExtractionSpecPojo {
		public String criteria; // A javascript expression that is passed the document as _doc - if returns false then this pipeline element is bypassed
		public String engineName; // The name of the text engine to use (can be fully qualified (eg "com.ikanow.infinit.e.harvest.boilerpipe"), or just the name (eg "boilerpipe") if the engine is registered in the Infinit.e system configuration)
		public LinkedHashMap<String, String> engineConfig; // The configuration object to be passed to the engine
		public String entityFilter; // (regex applied to entity indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only)
		public String assocFilter; // (regex applied to new-line separated association indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only) 
	}

...

Replaces "useExtractor" in the Source object
(Note "criteria" above is not currently supported - coming soon!)

TODO

Description

Legacy documentation:

Enrichment engines

TODO

Examples

TODO

Versions Compared

Old Version 2

New Version 3

Key

Overview

Format

Description

Examples

Page Comparison

Versions Compared

Old Version 2

New Version 3

Key

Overview

Format

Description

Examples