...
...
Format
Code Block |
---|
{ "display": string, "text": [ { "fieldName":string,// One of "fullText", "description", "title" "script":string,// The script/xpath/javascript expression (see scriptlang below) "flags":string, // Standard Java regex field (regex/xpath only), plus "H" to decode HTML "replacement":string, // Replacement string for regex/xpath+regex matches, can include capturing groups as $1 etc "scriptlang":string, // One of "javascript", "regex", "xpath" } //.. ] } |
...
...
...
...
...
...
Examples
Anchor | ||||
---|---|---|---|---|
|
For power users, metadata can be generated from the content using javascript. This gives a huge amount of flexibility to apply site/source-specific knowledge to pull out metadata that can be turned into entities or associations.
Log File From File Share
In the following example, manual text transformation is used to parse a log file over the web, with
...
a script
...
of type javascript.
Globals is used to define a function called "decode," which is then used to capture the metadata for the sample input data in a variable called "info."
Info can be used to capture the metadata for the sample input data as follows:
- info.date
- info.srcIP
- info.dstIP
- info.alert
- info.country
Code Block |
---|
{
"globals": {
"scripts": [
"function decode(x)\n{\n var info = {}; \n var rec = x.split(','); \n info.device = rec[0];\n info.date = rec[1];\n info.srcIP = rec[2];\n info.dstIP = rec[3];\n info.alert = rec[4];\n info.country = rec[5];\n return info;\n}"
]
}
},
{
"harvest": {
"searchCycle_secs": 3600
}
},
{
"docMetadata": {
"title": "$metadata.info.alert @ $metadata.info.date [$metadata.info.device]: $metadata.info.dstIP -> $metadata.info.srcIP",
"publishedDate": "$SCRIPT( return _doc.metadata.info[0].date; )"
}
},
{
"contentMetadata": [
{
"fieldName": "info",
"script": "var info = decode(text); info;",
"scriptlang": "javascript"
}
]
} |
Metadata:
This captured metadata from the sample input data can then be used as output for the script.
Code Block |
---|
], "fullText": "SCANNER_1 , 2012-01-01T13:43:00 , 10.0.0.1 , 66.66.66.66 , DUMMY_ALERT_TYPE_1 , United States",
"mediaType": ["Log"],
"metadata": {"info": [{
"alert": "DUMMY_ALERT_TYPE_1 ",
"country": "United States",
"date": "2012-01-01T13:43:00",
"device": "SCANNER_1 ",
"dstIP": "66.66.66.66",
"srcIP": " 10.0.0.1"
}]}, |
Javascript can also return more complex objects, arrays of objects, or array of primitives.
Anchor | ||||
---|---|---|---|---|
|
Log File
Source:
Consider the following alarm logs which include a record of device alerts, including their network and physical locations.
Code Block |
---|
Date,Device,SrcIP,dstIP,Alert,Country
SCANNER_1,2012-01-01T13:43:00,10.0.0.1,66.66.66.66,DUMMY_ALERT_TYPE_1,United States
SCANNER_2,2012-02-01T14:21:00,SCANNER_2,10.0.0.2,66.66.66.66,DUMMY_ALERT_TYPE_2,United Kingdom
SCANNER_3,2012-03-01T15:17:00,10.0.0.1,99.66.99.66,DUMMY_ALERT_TYPE_3,Netherlands |
Source Configuration:
In the source configuration, a regex script is used to extract data to make up the "fullText" and "description" of the resulting document.
Code Block |
---|
},
{
"text": [
{
"fieldName": "fullText",
"script": ",",
"scriptlang": "regex",
"flags": "md",
"replacement": " , "
},
{
"fieldName": "description",
"script": ",",
"scriptlang": "regex",
"flags": "md",
"replacement": " , "
}
]
}, |
Output:
.The example output includes the "fullText" which results from the regex script.
Code Block |
---|
}
],
"fullText": "SCANNER_1 , 2012-01-01T13:43:00 , 10.0.0.1 , 66.66.66.66 , DUMMY_ALERT_TYPE_1 , United States",
"mediaType": ["Log"],
"metadata": {"info": [{
"alert": "DUMMY_ALERT_TYPE_1 ",
"country": "United States",
"date": "2012-01-01T13:43:00",
"device": "SCANNER_1 ",
"dstIP": "66.66.66.66",
"srcIP": " 10.0.0.1"
}]},
"modified": "Jun 4, 2013 12:54:34 AM UTC",
"publishedDate": "January 1, 2012 13:43:00 PM UTC",
"source": ["Cyber Logs Test"],
"sourceKey": ["INFINITE_ENDPOINT.api.share.get.51ad28a440b4a4f0f757824c.25.26"],
"tags": [
"cyber",
"structured"
],
"title": "DUMMY_ALERT_TYPE_1 @ 2012-01-01T13:43:00 [SCANNER_1 ]: 66.66.66.66 -> 10.0.0.1",
"url": "http://INFINITE_ENDPOINT/api/share/get/51ad28a440b4a4f0f757824c#1"
} |
Anchor | ||||
---|---|---|---|---|
|
Neither regex nor javascript are well suited for extracting fields from HTML and XML.
As a result, Infinit.e
...
supports XPath 1.0 (with one minor extension to allow combined XPath regex).
In this example, an Xpath script is used as part of manual text extraction, in order to convert a sample XML document into JSON.
XML
Source Input:
Consider the following xml file, which includes a price list for several food items.
Code Block |
---|
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>thick slices made from our homemade sourdough bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu> |
Source Configuration:
In the source configuration example below, a xpath script is specified to perform the JSON conversion.
Code Block |
---|
{
"links": {
"extraMeta": [
{
"context": "First",
"fieldName": "convert_to_json",
"flags": "o",
"script": "//breakfast_menu/food[*]",
"scriptlang": "xpath"
}
],
"script": "function convert_to_docs(jsonarray, url)\n{\n var docs = [];\n for (var docIt in jsonarray) {\n var predoc = jsonarray[docIt];\n delete predoc.content;\n var doc = {};\n doc.url = _doc.url.replace(/[?].*/,\"\") + '#' + docIt;\n doc.fullText = predoc;\n doc.title = \"TBD\";\n doc.description = \"TBD\";\n docs.push(doc);\n }\n return docs;\n}\nvar docs = convert_to_docs(_doc.metadata['convert_to_json'], _doc.url);\ndocs;",
"scriptflags": "d"
} |
Output:
The output returns an array of JSON formatted responses:
Code Block |
---|
{
"communityId": ["4d38b72c054548f038a0414a"],
"created": "Jun 5, 2013 09:12:15 PM UTC",
"description": "TBD",
"fullText": "{
\"calories\" : \"650\" , \"description\" : \"two of our famous Belgian
Waffles with plenty of real maple syrup\" , \"price\" : \"$5.95\" ,
\"name\" : \"Belgian Waffles\"}",
"mediaType": ["News"],
"metadata": {"json": [{
"calories": "650",
"description": "two of our famous Belgian Waffles with plenty of real maple syrup",
"name": "Belgian Waffles",
"price": "$5.95"
}]},
"modified": "Jun 5, 2013 09:12:15 PM UTC",
"publishedDate": "Jun 5, 2013 09:12:15 PM UTC",
"source": ["aaa xml test"],
"sourceKey": ["www.w3schools.com.xml.simple.xml"],
"tags": ["tag1"],
"title": "TBD",
"url": "http://www.w3schools.com/xml/simple.xml#0"
}
{
"communityId": ["4d38b72c054548f038a0414a"],
"created": "Jun 5, 2013 09:12:15 PM UTC",
"description": "TBD",
"fullText": "{
\"calories\" : \"900\" , \"description\" : \"light Belgian waffles
covered with strawberries and whipped cream\" , \"price\" : \"$7.95\" ,
\"name\" : \"Strawberry Belgian Waffles\"}",
"mediaType": ["News"],
"metadata": {"json": [{
"calories": "900",
"description": "light Belgian waffles covered with strawberries and whipped cream",
"name": "Strawberry Belgian Waffles",
"price": "$7.95"
}]},
"modified": "Jun 5, 2013 09:12:15 PM UTC",
"publishedDate": "Jun 5, 2013 09:12:15 PM UTC",
"source": ["aaa xml test"],
"sourceKey": ["www.w3schools.com.xml.simple.xml"],
"tags": ["tag1"],
"title": "TBD",
"url": "http://www.w3schools.com/xml/simple.xml#1"
} |
Panel |
---|
Footnotes: Legacy documentation: Legacy documentation:
|
...
...