Format
TODO convert to JSON
Code Block |
---|
{
"display": string,
"text": [
{ |
...
"fieldName":string,// |
...
One |
...
Description
...
of "fullText", "description", "title"
"script":string,// The script/xpath/javascript expression (see scriptlang below)
"flags":string, // Standard Java regex field (regex/xpath only), plus "H" to decode HTML
"replacement":string, // Replacement string for regex/xpath+regex matches, can include capturing groups as $1 etc
"scriptlang":string, // One of "javascript", "regex", "xpath"
}
//..
]
} |
...
...
...
...
...
...
...
...
...
...
...
...
- javascript
- regex
- xpath
javascript
...
Examples
Anchor | ||||
---|---|---|---|---|
|
For power users, metadata can be generated from the content using javascript. This gives a huge amount of flexibility to apply site/source-specific knowledge to pull out metadata that can be turned into entities or associations.
...
Log File From File Share
In the following example, manual text transformation is used to parse a log file over the web, with
...
a script
...
of type javascript.
...
Globals is used to define a function called "decode," which is then used to capture the metadata for the sample input data in a variable called "info."
Info can be used to capture the metadata for the sample input data as follows:
- info.date
- info.srcIP
- info.dstIP
- info.alert
- info.country
Code Block |
---|
{ " |
...
globals": |
...
{ |
...
"scripts": [ " |
...
function decode(x)\n{\n var info = {}; \n var rec = x.split(','); \n |
...
|
...
info.device = |
...
rec[0];\n info.date = rec[1];\n |
...
info.srcIP = rec[2];\n info.dstIP = rec[3];\n |
...
info.alert = rec[4];\n info.country = rec[5];\n return info;\n}" |
...
]
|
...
} |
...
}, |
...
{ |
...
|
...
"harvest": { |
...
" |
...
searchCycle_secs": |
...
3600 } |
...
|
...
}, |
...
{ " |
...
docMetadata": |
...
{ |
...
" |
...
title": " |
...
$metadata.info.alert @ $metadata.info.date [$metadata.info.device]: $metadata.info.dstIP -> $metadata.info.srcIP", " |
...
publishedDate": " |
...
$SCRIPT( return _doc.metadata.info[0].date; )" } }, { |
...
"contentMetadata": [ |
...
{ " |
...
fieldName": " |
...
info", " |
...
script": " |
...
var info = |
...
decode(text); info;", " |
...
scriptlang": " |
...
javascript" } ] } |
...
After "globals" has been used to define a variable called info, info can be used to capture the metadata for the sample input data. the metada that will be captured in the example is as follows:
- info.date
- info.srcIP
- info.dstIP
- info.alert
- info.country
...
Metadata:
This captured metadata from the sample input data can then be used as output for the script.
Code Block |
---|
], "fullText": "SCANNER_1 , 2012-01-01T13:43:00 , 10.0.0.1 , 66.66.66.66 , DUMMY_ALERT_TYPE_1 , United States",
"mediaType": ["Log"],
"metadata": {"info": [{
"alert": "DUMMY_ALERT_TYPE_1 ",
"country": "United States",
"date": "2012-01-01T13:43:00",
"device": "SCANNER_1 ",
"dstIP": "66.66.66.66",
"srcIP": " 10.0.0.1"
}]}, |
...
Javascript can also return more complex objects, arrays of objects, or array of primitives.
Anchor | ||||
---|---|---|---|---|
|
xml
The following example shows how a regex script can be used to manually parse the text of the ingested data
...
Log File
Source:
Consider the following alarm logs which include a record of device alerts, including their network and physical locations.
Code Block |
---|
Date,Device,SrcIP,dstIP,Alert,Country
SCANNER_1,2012-01-01T13:43:00,10.0.0.1,66.66.66.66,DUMMY_ALERT_TYPE_1,United States
SCANNER_2,2012-02-01T14:21:00,SCANNER_2,10.0.0.2,66.66.66.66,DUMMY_ALERT_TYPE_2,United Kingdom
SCANNER_3,2012-03-01T15:17:00,10.0.0.1,99.66.99.66,DUMMY_ALERT_TYPE_3,Netherlands |
Source Configuration:
In the source configuration, a regex script is used to extract data to make up the "fullText" and "description" of the resulting document.
Code Block |
---|
}, |
...
{ "text": [ |
...
|
...
{ |
...
"fieldName": "fullText", |
...
"script": ",", |
...
"scriptlang": "regex", "flags": " |
...
md", |
...
"replacement": " , "
},
{
"fieldName": "description",
"script": ",",
"scriptlang": "regex",
"flags": "md",
"replacement": " , "
}
]
}, |
Output:
.The example output includes the "fullText" which results from the regex script.
Code Block |
---|
}
],
"fullText": "SCANNER_1 , 2012-01-01T13:43:00 , 10.0.0.1 , 66.66.66.66 , DUMMY_ALERT_TYPE_1 , United States",
"mediaType": ["Log"],
"metadata": {"info": [{
"alert": "DUMMY_ALERT_TYPE_1 ",
"country": "United States",
"date": "2012-01-01T13:43:00",
"device": "SCANNER_1 ",
"dstIP": "66.66.66.66",
"srcIP": " 10.0.0.1"
}]},
"modified": "Jun 4, 2013 12:54:34 AM UTC",
"publishedDate": "January 1, 2012 13:43:00 PM UTC",
"source": ["Cyber Logs Test"],
"sourceKey": ["INFINITE_ENDPOINT.api.share.get.51ad28a440b4a4f0f757824c.25.26"],
"tags": [
"cyber",
"structured"
],
"title": "DUMMY_ALERT_TYPE_1 @ 2012-01-01T13:43:00 [SCANNER_1 ]: 66.66.66.66 -> 10.0.0.1",
"url": "http://INFINITE_ENDPOINT/api/share/get/51ad28a440b4a4f0f757824c#1"
} |
Anchor | ||||
---|---|---|---|---|
|
Neither regex nor javascript are well suited for extracting fields from HTML and XML.
As a result, Infinit.e supports XPath 1.0 (with one minor extension to allow combined XPath regex).
In this example, an Xpath script is used as part of manual text extraction, in order to convert a sample XML document into JSON.
XML
Source Input:
Consider the following xml file, which includes a price list for several food items.
Code Block |
---|
<?xml version="1.0" encoding="UTF-8"?> <breakfast_menu> <food> <name>Belgian Waffles</name> <price>$5.95</price> <description>two of our famous Belgian Waffles with plenty of real maple syrup</description> <calories>650</calories> </food> <food> <name>Strawberry Belgian |
...
Waffles</name> <price>$7.95</price> <description>light Belgian waffles |
...
covered with strawberries and |
...
whipped cream</description> <calories>900</calories> </food> <food> <name>Berry-Berry Belgian Waffles</name> |
...
|
...
<price>$8.95</price> <description>light Belgian waffles covered with an |
...
assortment of fresh berries and whipped cream</description> |
...
<calories>900</calories> |
...
The example code snipet, the manual text transformation is defining a field name called "organization" and it uses Regex to search the input XML data to find matches. In the case of this example, the XML data is an incident report.
.The sample output reports that no known "organization" was implicated.
...
</food> <food> <name>French Toast</name> <price>$4.50</price> <description>thick slices made from our homemade sourdough bread</description> |
...
<calories>600</calories> </food> <food> |
...
|
...
<name>Homestyle Breakfast</name> <price>$6.95</price> |
...
|
...
|
...
|
...
<description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description> |
...
|
...
<calories>950</calories> |
...
Xpath
Neither regex nor javascript are well suited for extracting fields from HTML and XML (particularly since the current Javascript engine, the Java version of Rhino, does not support DOM).
As a result, Infinit.e supports XPath 1.0 (with one minor extension to allow combined XPath regex).
In this example, an xpath script is used as part of manual text extraction, in order to convert a sample XML document into JSON.
...
</food>
</breakfast_menu> |
Source Configuration:
In the source configuration example below, a xpath script is specified to perform the JSON conversion.
Code Block |
---|
{ "links": { "extraMeta": [ { "context": "First", "fieldName": "convert_to_json", "flags": "o", "script": "//breakfast_menu/food[*]", "scriptlang": "xpath" } ], "script": "function |
...
convert_to_docs(jsonarray, url)\n{\n var docs = [];\n for (var |
...
docIt in jsonarray) {\n var predoc = jsonarray[docIt];\n |
...
delete predoc.content;\n var doc = {};\n doc.url = |
...
_doc.url.replace(/[?].*/,\"\") + '#' + docIt;\n doc.fullText = |
...
predoc;\n doc.title = \"TBD\";\n doc.description = |
...
\"TBD\";\n docs.push(doc);\n }\n return docs;\n}\nvar docs = |
...
convert_to_docs(_doc.metadata['convert_to_json'], _doc.url);\ndocs;",
"scriptflags |
...
": "d" |
...
|
...
} |
...
Output:
The
...
output returns an array of JSON formatted responses
...
:
Code Block |
---|
{
"communityId": ["4d38b72c054548f038a0414a"],
"created": "Jun 5, 2013 09:12:15 PM UTC",
"description": "TBD",
"fullText": "{
\"calories\" : \"650\" , \"description\" : \"two of our famous Belgian
Waffles with plenty of real maple syrup\" , \"price\" : \"$5.95\" ,
\"name\" : \"Belgian Waffles\"}",
"mediaType": ["News"],
"metadata": {"json": [{
"calories": "650",
"description": "two of our famous Belgian Waffles with plenty of real maple syrup",
"name": "Belgian Waffles",
"price": "$5.95"
}]},
"modified": "Jun 5, 2013 09:12:15 PM UTC",
"publishedDate": "Jun 5, 2013 09:12:15 PM UTC",
"source": ["aaa xml test"],
"sourceKey": ["www.w3schools.com.xml.simple.xml"],
"tags": ["tag1"],
"title": "TBD",
"url": "http://www.w3schools.com/xml/simple.xml#0"
}
{
"communityId": ["4d38b72c054548f038a0414a"],
"created": "Jun 5, 2013 09:12:15 PM UTC",
"description": "TBD",
"fullText": "{
\"calories\" : \"900\" , \"description\" : \"light Belgian waffles
covered with strawberries and whipped cream\" , \"price\" : \"$7.95\" ,
\"name\" : \"Strawberry Belgian Waffles\"}",
"mediaType": ["News"],
"metadata": {"json": [{
"calories": "900",
"description": "light Belgian waffles covered with strawberries and whipped cream",
"name": "Strawberry Belgian Waffles",
"price": "$7.95"
}]},
"modified": "Jun 5, 2013 09:12:15 PM UTC",
"publishedDate": "Jun 5, 2013 09:12:15 PM UTC",
"source": ["aaa xml test"],
"sourceKey": ["www.w3schools.com.xml.simple.xml"],
"tags": ["tag1"],
"title": "TBD",
"url": "http://www.w3schools.com/xml/simple.xml#1"
} |
Panel |
---|
Footnotes: Legacy documentation: Legacy documentation:
|