Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page has been broken down into the following sections for ease of localization:

Table of Contents

Format

TODO convert to JSON

Code Block
{
	"display": string,
	"text": [
	{} // see ManualTextExtractionSpecPojo below
	]
}
//////////////////////////////////
	public static class ManualTextExtractionSpecPojo {
		public String fieldName; "fieldName": string,// One of "fullText", "description", "title"
		public String script; "script": string,// The script/xpath/javascript expression (see scriptlang below)
		public String flags;
		"flags": string, // Standard Java regex field (regex/xpath only), plus "H" to decode HTML
		public String replacement; "replacement": string, // Replacement string for regex/xpath+regex matches, can include capturing groups as $1 etc
		public String scriptlang;"scriptlang": string, // One of "javascript", "regex", "xpath"
	}

Description

Using manual text transformation you can specify the data source for your script to work on.  The script is used to enrich the data from the data sources so it can be outputted as metadata for the creation of advanced entities and associations.

...

You can program manual text extraction using the following supported languages

  • javascriptJavascript
  • regexRegex
  • xpathXpath

...

Javascript

For power users, metadata can be generated from the content using javascript. This gives a huge amount of flexibility to apply site/source-specific knowledge to pull out metadata that can be turned into entities or associations.

...

Log File From File Share

In the following example, manual text transformation is used to parse a log file over the web, with a script of type javascript.

...

Code Block
 ],    "fullText": "SCANNER_1 , 2012-01-01T13:43:00 , 10.0.0.1 , 66.66.66.66 , DUMMY_ALERT_TYPE_1 , United States",
    "mediaType": ["Log"],
    "metadata": {"info": [{
        "alert": "DUMMY_ALERT_TYPE_1 ",
        "country": "United States",
        "date": "2012-01-01T13:43:00",
        "device": "SCANNER_1 ",
        "dstIP": "66.66.66.66",
        "srcIP": " 10.0.0.1"
    }]},

 

Obviously the javascript Javascript can also return more complex objects, arrays of objects, or array of primitives.

 

...

Regex

...

XML

The following example shows how a regex script can be used to manually parse the text of the ingested data:

...

In the example code snippet, the manual text transformation is defining defines a field name called "organization" and it uses regex Regex to search the input XML data to find matches.  In the case of this example, the XML data is an incident report.

...

As a result, Infinit.e supports XPath 1.0 (with one minor extension to allow combined XPath regex).

In this example, an xpath Xpath script is used as part of manual text extraction, in order to convert a sample XML document into JSON.

...