Format
{ "display": string, "docMetadata": { "title":string,// The string expression or $SCRIPT(...) specifying the document title "description":string,// The string expression or $SCRIPT(...) specifying the document description "publishedDate":string,// The string expression or $SCRIPT(...) specifying the document publishedDate "mediaType": string, // The string expression or $SCRIPT(...) specifying the document mediaType (otherwise taken from top-level source field) "tags": string, // A ,-separated list of string expressions or $SCRIPT(...) - returning a ,-separated list, the result of each will be added to the tags "fullText":string,// The string expression or $SCRIPT(...) specifying the document fullText "displayUrl":string,//The string expression or $SCRIPT(...) specifying the document displayUrl "appendTagsToDocs":Boolean,// if true, source tags are appended to the document. Default value is false. "geotag": {config_param_name"},//Specify a document level geo-tag } }
Examples
Setting Metadata Values
When document metadata is extracted from a source (via the File, Database, or other technique), each field extracted is captured in the Feed.metadata object. Using document metadata, data stored in the Metadata object can be access using the $ operator to signify that we are attempting to retrieve data from a field in our document.
Web Feed Example (Twitter)
In the example, the docMetadata
block references metadata objects using the $ operator. $SCRIPT is used to return variables which can then be transformed further.
The script is used to define the following parameters for the document metadata
- title
- description
- fulltext
- publisheddate
}, { "docMetadata": { "title": "$metadata.json.body", "description": "$metadata.json.body", "fullText": "$metadata.json.body", "publishedDate": "$SCRIPT(return _doc.metadata.json[0].postedTime.replace(/.[0-9]{3}Z/,'Z');)", "geotag": { "lat": "$SCRIPT( try {return _doc.metadata.json[0].geo.coordinates[0];} catch (err) {return '';})", "lon": "$SCRIPT( try {return _doc.metadata.json[0].geo.coordinates[1];} catch (err) {return '';})" } } },
"Office" Documents Example
In this example, the subject line of an email correspondence can be extracted by Document metadata and set as the title of the resulting document.
}, { "docMetadata": { "title": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)" } },
In the sample output we can see the "title" that was set using the docMetadata
script.
{ "_id": "5048efb0e4b01fd6455420ee", "title": "RE: Testing Preschedule workspace", "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3", "created": "Sep 6, 2012 06:42:01 PM UTC", "modified": "Jul 24, 2012 01:13:02 AM UTC", "publishedDate": "Jul 9, 2001 06:33:32 PM UTC", "source": [ "Enron Emails (TextRank)" ], "sourceKey": [ "modus.139.enron.testing.." ], "mediaType": [ "Email" ], "description": "I am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ",
Setting Metadata Values for Location
In the example $SCRIPT is used to set the values for geotag elements city, country, and stateProvince. It references functions and variables imported by globals.
}, { "docMetadata": { "title": "$metadata.subject", "description": "$metadata.summary", "publishedDate": "$metadata.incidentdate", "geotag": { "city": "$SCRIPT( return _doc.metadata.location[0].citystateprovince.city; )", "country": "$SCRIPT( return _doc.metadata.location[0].country; )", "stateProvince": "$SCRIPT( return _doc.metadata.location[0].citystateprovince.stateprovince; )" } } },
Globals:
{ "globals": { "scripts": [ "function getLocationEntity() { var s = (_iterator.citystateprovince.city != null) ? _iterator.citystateprovince.city : ''; s+= (s.length > 0) ? ',' : ''; s+= (_iterator.citystateprovince.stateprovince != null) ? _iterator.citystateprovince.stateprovince : ''; s+= (s.length > 0) ? ',' : ''; s+= (_iterator.country != null) ? _iterator.country : ''; return s; } function getVictim() { var indicator = (_iterator.indicator != 'Unknown') ? _iterator.indicator : ''; var victimType = (_iterator.victimtype != 'Unknown') ? _iterator.victimtype : ''; var child = (_iterator.child == 'Yes') ? 'Child' : 'Adult'; var combatant = (_iterator.combatant == 'Yes') ? 'Combatant' : ''; var targeted = (_iterator.targetedcharacteristic != 'None' && _iterator.targetedcharacteristic != 'Unknown') ? _iterator.targetedcharacteristic : ''; var defining = (_iterator.definingcharacteristic != 'None' &&_iterator.definingcharacteristic != 'Unknown') ? _iterator.definingcharacteristic : ''; var s = indicator; if (victimType.length > 0) { if (s.length > 0) { s += ', '; } s += victimType; } if (s.length > 0) { s += ', '; } s += child; if (combatant.length > 0) { if (s.length > 0) { s += ', '; } s += combatant; } if (targeted.length > 0) { if (s.length > 0) { s += ', '; } s += targeted; } if (defining.length > 0) { if (s.length > 0) { s += ', '; } s += defining; } if (s.length > 0) { s += ' from '; } s += _iterator.nationality; return s; } function getVictimCount() { var count = parseInt(_iterator.deadcount, 10) + parseInt(_iterator.woundedcount, 10); return count; } function getEventType() { var s = _value; if (_doc.metadata.assassination[0] == 'Yes') s += ', Assassination'; if (_doc.metadata.suicide[0] == 'Yes') s += ', Suicide'; if (_doc.metadata.ied[0] == 'Yes') s += ', IED'; return s; } function getEventTypeFull() { var s = _doc.metadata.eventtype[0]; if (_doc.metadata.assassination[0] == 'Yes') s += ', Assassination'; if (_doc.metadata.suicide[0] == 'Yes') s += ', Suicide'; if (_doc.metadata.ied[0] == 'Yes') s += ', IED'; return s;} function isOrganizationSpecified() { if (_doc.metadata.organization != null && _doc.metadata.organization[0].toString().toLowerCase() == 'no group') { return false; } else { return true; } }function getOrganizationName() { if (_doc.metadata.organization != null && _doc.metadata.organization[0].toString().toLowerCase() != 'no group') { return _doc.metadata.organization[0]; } }" ] }
Output:
The output of the example source, returns the location information pertaining to the source data.
"location": [{ "citystateprovince": { "city": "Manugay", "stateprovince": "Kunar" }, "country": "Afghanistan", "region": "South Asia"
Footnotes:
Legacy documentation:
Legacy documentation: