Overview

Starting with either the raw content (or the content transformed by a preceding manual or automated text pipeline element), applies the javascript, regex, or xpath transformation and writes the output to the document's full text (or description, or title, or one of the textual metadata fields).

TODO

Format

Code Block
TODO

Legacy documentation:

...

TODO convert to JSON

Code Block

{
	"display": string,
	"text": [
	{} // see ManualTextExtractionSpecPojo below
	]
}
//////////////////////////////////
	public static class ManualTextExtractionSpecPojo {
		public String fieldName; // One of "fullText", "description", "title"
		public String script; // The script/xpath/javascript expression (see scriptlang below)
		public String flags; // Standard Java regex field (regex/xpath only), plus "H" to decode HTML
		public String replacement; // Replacement string for regex/xpath+regex matches, can include capturing groups as $1 etc
		public String scriptlang; // One of "javascript", "regex", "xpath"
	}

Legacy documentation:

See under "simpleTextCleanser object"
- (note headers and footers are no longer supported - you can just do this manually now)

TODO

Description

Legacy documentation:

TODOUnstructured Analysis - Overview

TODO

Examples

TODO

Versions Compared

Old Version 1

New Version 2

Key

Overview

Format

Description

Examples

Page Comparison

Versions Compared

Old Version 1

New Version 2

Key

Overview

Format

Description

Examples