...
In addition, you can use XmlSourceName
to build the document URL. If specified, the document URL is build built as "XmlSourceName" + xml("XmlPrimaryKey").
You can use the parameter XmlPrimaryKey
to help identify whether a record is new or previously harvested.
Info |
---|
For JSON files, the parameter XmlIgnoreValues is not applicable. |
...
You can use XmlRootLevelValues
to set the field names.
When you do this, CSV parsing occurs automatically and the records are mapped into a metadata object called "csv" with the field names corresponding to the values of this array.
TODO In the source example
Deriving Field Names Automatically
The field names can also be derived automatically from the headers.
The field "XmlIgnoreValues" is used to identify the headers - the start of each line is compared to each element in "XmlIgnoreValues", if it matches then that line is designated as a header and does not generate a document.
Furthermore, if the header line contains the right number of fields, then it is used to generate the field names used in the "csv" object.
TODO explanation of using the quote char
Example:
below, the field names will correspond to the included array: "device","date", "srcIP" etc.
Code Block |
---|
"processingPipeline": [ {
"file": {
"XmlRootLevelValues": [
"device",
"date",
"srcIP",
"dstIP",
"alert",
"country"
],
"XmlIgnoreValues": [
"device,date,srcIP"
],
"domain": "DOMAIN",
"password": "PASSWORD",
"type": "csv",
"username": "USER",
"url": "smb://FILESHARE:139/cyber_logs/"
}
}, |
Deriving Field Names Automatically
The field names can also be derived automatically from the headers.
The field "XmlIgnoreValues" is used to identify the headers - the start of each line is compared to each element in "XmlIgnoreValues", if it matches then that line is designated as a header and does not generate a document.
Furthermore, if the header line contains the right number of fields, then it is used to generate the field names used in the "csv" object.
For the purpose of example, consider csv data starting with the # character.
Code Block |
---|
#Date,Device,SrcIP,dstIP,Alert,Country
SCANNER_1,2012-01-01T13:43:00,10.0.0.1,66.66.66.66,DUMMY_ALERT_TYPE_1,United States |
In the example source below, XmlIgnoreValues automatically identifies the header using #, and no document is generated.
Code Block |
---|
"processingPipeline": [ {
"file": {
"XmlIgnoreValues": [
"#"
],
"domain": "DOMAIN",
"password": "PASSWORD",
"type": "csv",
"username": "USER",
"url": "smb://FILESHARE:139/cyber_logs/"
}
}, |
If "XmlIgnoreValues": "#", and the first three lines are "#", "#header", and "#field1,field2,field3" then the processing will assume the 3 fields are field1, field2, and field3.
...
eg assuming the quote char is ', then "`#`" in the above example would return 3 fields: "#field1", "field2" and "field3"TODO insert source example
Info |
---|
...