...
The following parameters are used in the configuration of manual text transformation
Parameter | Description | Note | Data Type |
---|---|---|---|
fieldName | Specifies the data source that the script will execute against "fullText," "description," or "title" | ||
script | Specify your script | ||
flags | Standard Java regex field Can have different values, based on See below. | ||
javascript: There are a few flags that provide additional variables in the javascript:
| |||
xpath (and regex, except for "O"):
| |||
replacement | If eg. You could find the instance C/M or C/F in a document and extract that it is important to note that the Race is Caucasian. The same can be done to extract M or F as a Sex meaning Male or Female. | ||
scriptlang | Specifies the language of the script that will be provided One of "javascript," "regex," or "xpath" |
Examples
...
Supported Script Languages
You can program manual text extraction using the following supported languages
- Javascript
- Regex
- Xpath
Javascript
For power users, metadata can be generated from the content using javascript. This gives a huge amount of flexibility to apply site/source-specific knowledge to pull out metadata that can be turned into entities or associations.
Log File From File Share:
In the following example, manual text transformation is used to parse a log file over the web, with a script
of type javascript.
...
Javascript can also return more complex objects, arrays of objects, or array of primitives.
Regex
XML:
The following example shows how a regex script can be used to manually parse the text of the ingested data:
...
Code Block |
---|
}], "multipledays": ["No"], "organization": ["No group"], "perpetrator": [{ "characteristic": "Islamic Extremist (Sunni)", "nationality": "Unknown" }], |
...
Xpath
Neither regex nor javascript are well suited for extracting fields from HTML and XML.
...