Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition, you can use XmlSourceName to build the document URL.  If specified, the document URL is build built as "XmlSourceName" + xml("XmlPrimaryKey").

You can use the parameter XmlPrimaryKey to help identify whether a record is new or previously harvested.

Info
For JSON files, the parameter XmlIgnoreValues is not applicable.

...

You can use XmlRootLevelValues to set the field names.

When you do this, CSV parsing occurs automatically and the records are mapped into a metadata object called "csv" with the field names corresponding to the values of this array.

TODO In the source example

Deriving Field Names Automatically

The field names can also be derived automatically from the headers.

The field "XmlIgnoreValues" is used to identify the headers - the start of each line is compared to each element in "XmlIgnoreValues", if it matches then that line is designated as a header and does not generate a document.

Furthermore, if the header line contains the right number of fields, then it is used to generate the field names used in the "csv" object.

TODO explanation of using the quote char

Example:

below, the field names will correspond to the included array: "device","date", "srcIP" etc.

 

Code Block
 "processingPipeline": [        {
            "file": {
                "XmlRootLevelValues": [
                    "device",
                    "date",
                    "srcIP",
                    "dstIP",
                    "alert",
                    "country"
                ],
                "XmlIgnoreValues": [
                    "device,date,srcIP"
                ],
                "domain": "DOMAIN",
                "password": "PASSWORD",
                "type": "csv",
                "username": "USER",
                "url": "smb://FILESHARE:139/cyber_logs/"
            }
        },

 

Deriving Field Names Automatically

The field names can also be derived automatically from the headers.

The field "XmlIgnoreValues" is used to identify the headers - the start of each line is compared to each element in "XmlIgnoreValues", if it matches then that line is designated as a header and does not generate a document.

Furthermore, if the header line contains the right number of fields, then it is used to generate the field names used in the "csv" object.

For the purpose of example, consider csv data starting with the # character.

Code Block
#Date,Device,SrcIP,dstIP,Alert,Country
SCANNER_1,2012-01-01T13:43:00,10.0.0.1,66.66.66.66,DUMMY_ALERT_TYPE_1,United States

 

In the example source below, XmlIgnoreValues automatically identifies the header using #, and no document is generated.

Code Block
"processingPipeline": [        {
            "file": {
                "XmlIgnoreValues": [
                    "#"
                ],
                "domain": "DOMAIN",
                "password": "PASSWORD",
                "type": "csv",
                "username": "USER",
                "url": "smb://FILESHARE:139/cyber_logs/"
            }
        },

 

If "XmlIgnoreValues": "#", and the first three lines are "#", "#header", and "#field1,field2,field3" then the processing will assume the 3 fields are field1, field2, and field3.

...

eg assuming the quote char is ', then "`#`" in the above example would return 3 fields: "#field1", "field2" and "field3"TODO insert source example

 

Info

For "*csv" files where XmlRootLevelValues is set), where the document(s) within the file references a unique network resource that is of the format "CONSTANT_URL_PATH + VARIABLE_ID" (eg "http://www.website.com?pageNum=3454354"), and the "VARIABLE_ID" component is one of the fields in the XML/JSON object, then "XmlSourceName" and "XmlPrimaryKey" can be used to specify the two components. Note that for JSON the dot notation can be used in "XmlPrimaryKey" for nested fields.

If it is not possible to specify the URL in this manner (but there is a single - not necessarily unique - URI that is related to the document - eg either a network resource or a file in a sub-directory of the fileshare), it is recommended to use the structured analysis handler to set the "displayUrl" parameter.

...