...
Code Block |
---|
{ "description": "wits test", "isPublic": true, "mediaType": "Report", "searchCycle_secs": -1, "tags": [ "incidents", "nctc", "terrorism", "wits", "events", "worldwide" ], "title": "wits test", "processingPipeline": [ { "file": { "XmlIgnoreValues": [ "DefiningCharacteristicList", "TargetedCharacteristicList", "WeaponTypeList", "PerpetratorList", "VictimList", "EventTypeList", "CityStateProvinceList", "FacilityList" ], "XmlPrimaryKey": "icn", "XmlRootLevelValues": [ "Incident" ], "XmlSourceName": "https://wits.nctc.gov/FederalDiscoverWITS/index.do?N=0&Ntk=ICN&Ntx=mode%20match&Ntt=", "domain": "XXX", "password": "XXX", "username": "XXX", "url": "smb://modus:139/wits/allfiles/" } }, |
Configuring CSV/SV
There are two options for configuring CSV:
- Specify the field names manually
- Derive the field names from the header
These are described below
Specifying the field names manually
You can use XmlRootLevelValues
to set the root object for CSV/SV file parsing.field names
When you do this, CSV parsing occurs automatically and the records are mapped into a metadata object called "csv" with the field names corresponding to the values of this array.
In the following sample code, the file extractor is configured to act on .csv content to set the root object and make additional configurations.
Code Block |
---|
{
"description": "For cyber demo",
"isPublic": false,
"mediaType": "Log",
"searchCycle_secs": 3600,
"tags": [
"cyber",
"structured"
],
"title": "Cyber Logs Test",
"processingPipeline": [
{
"file": {
"XmlRootLevelValues": [],
"domain": "DOMAIN",
"password": "PASSWORD",
"type": "csv",
"username": "USER",
"url": "smb://FILESHARE:139/cyber_logs/"
}
}, |
Using XmlIgnore Values to Derive Field Names Automatically
The fieldnames can also be derived automatically by setting XmlIgnoreValues
. In this case, XmlRootLevelValues
need not be set.
For "*sv" files
TODO source example
Using XmlIgnore Values to Derive Field Names Automatically
The field names can also be derived automatically from the headers.
The field "XmlIgnoreValues" is used to identify the headers - the start of each line is compared to each of the strings in this array - if they match the line is ignored. This allows header lines to be ignored.In addition, the first line matching an ignore value field that consists of the more than 1 token-separated field will be used to generate the fieldnameselement in "XmlIgnoreValues", if it matches then that line is designated as a header and does not generate a document.
Furthermore, if the header line contains the right number of fields, then it is used to generate the field names used in the "csv" object.
Example:
If "XmlIgnoreValues": "#", and the first three lines are "#", "#header", and "#field1,field2,field3" then the processing will assume the 3 fields are field1, field2, and field3.
...