Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning

Most feature extractors require for text to have been extracted with a "textEngine" or "text" object before it in the pipeline, unless the data comes from file (which automatically fills in a document's "fullText" field). AlchemyAPI is an exception for URLs because it can do both steps. Other custom extractors may not require text, eg because they operate on existing metadata fields, or entities etc.

Table of Contents

Format

TODO convert to JSON

Code Block
{
	"display": string,
	"featureEngine": {} // see AutomatedEntityExtractionSpecPojo below
}
//////////////////////////////////  
	public static class AutomatedEntityExtractionSpecPojo {
		public String criteria; "criteria":string,// A javascript expression that is passed the document as _doc - if returns false then this pipeline element is bypassed
		public String engineName; "enginename":string,// The name of the text engine to use (can be fully qualified (eg "com.ikanow.infinit.e.harvest.boilerpipe"), or just the name (eg "boilerpipe") if the engine is registered in the Infinit.e system configuration)
		public LinkedHashMap<String, String> engineConfig; "engineConfig"{"config_param_name",string,...}// The configuration object to be passed to the engine
		public String entityFilter; "entityFilter":string,// (regex applied to entity indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only)
		public String assocFilter; "assocFilter":string,// (regex applied to new-line separated association indexes, starts with "+" or "-" to indicate inclusion/exclusion, defaults to include-only) 
	}

}

 

Description

Many of the automated text extraction tools can also perform the creation of entities and associations.

...