Overview
This toolkit element allows you to use regex, javascript, or xpath to create metadata objects (that can then be used to generate entities or assocations by subsequent pipeline elements)
TODO
Format
...
Code Block |
---|
TODO |
Legacy documentation:
{
"display": string,
"contentMetadata": [
{} // see MetadataSpecPojo below
]
}
//////////////////////////////////
public static class MetadataSpecPojo {
public String fieldName; // Any string, the key for generated array in "doc.metadata"
public String scriptlang; // One of "javascript", "regex", "xpath"
public String script; // The script that will generate the array in "doc.metadata" (under fieldName)
public String flags; // Standard Java regex field (regex/xpath only), plus "H" to decode HTML, "D": will allow duplicate strings (by default they are de-duplicated), plus the following custom flags:
// For javascript (defaults to "t" if none specified), "t" the script receives the doc fullText ("text"), "d" the script receives the entire doc (_doc), "m" the script receives the doc.metadata (_metadata)
// For xpath: "o": if the XPath expression points to an HTML (/XML) object, then this object is converted to JSON and stored as an object in the corresponding metadata field array. (Can also be done via the deprecated "groupNum":-1)
public String replace; // Replacement string for regex/xpath+regex matches, can include capturing groups as $1 etc
public Boolean store; // Whether this field should be stored in the DB or discarded after the harvet processing
public Boolean index; // Whether this field should be full-text indexed or just stored in the DB
}
|
Legacy documentation:
- UnstructuredAnalysis object (under "Meta object")
TODO
Description
Legacy documentation:
- TODOUnstructured Analysis - Overview (under "Specifying data as metadata", "Specifying metadata using javascript", "Using XPath to generate metadata")
TODO
Examples
TODO