Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
{
    "description": "wits test",
    "isPublic": true,
    "mediaType": "Report",
    "searchCycle_secs": -1,
    "tags": [
        "incidents",
        "nctc",
        "terrorism",
        "wits",
        "events",
        "worldwide"
    ],
    "title": "wits test",
    "processingPipeline": [
        {
            "file": {
                "XmlIgnoreValues": [
                    "DefiningCharacteristicList",
                    "TargetedCharacteristicList",
                    "WeaponTypeList",
                    "PerpetratorList",
                    "VictimList",
                    "EventTypeList",
                    "CityStateProvinceList",
                    "FacilityList"
                ],
                "XmlPrimaryKey": "icn",
                "XmlRootLevelValues": [
                    "Incident"
                ],
                "XmlSourceName": "https://wits.nctc.gov/FederalDiscoverWITS/index.do?N=0&Ntk=ICN&Ntx=mode%20match&Ntt=",
                "domain": "XXX",
                "password": "XXX",
                "username": "XXX",
                "url": "smb://modus:139/wits/allfiles/"
            }
        },

 

Configuring CSV/SV

There are two options for configuring CSV:

  • Specify the field names manually
  • Derive the field names from the header

These are described below

Specifying the field names manually

You can use XmlRootLevelValues to set the root object for CSV/SV file parsing.field names

When you do this, CSV parsing occurs automatically and the records are mapped into a metadata object called "csv" with the field names corresponding to the values of this array.

In the following sample code, the file extractor is configured to act on .csv content to set the root object and make additional configurations.

Code Block
{
    "description": "For cyber demo",
    "isPublic": false,
    "mediaType": "Log",
    "searchCycle_secs": 3600,
    "tags": [
        "cyber",
        "structured"
    ],
    "title": "Cyber Logs Test",
    "processingPipeline": [
        {
            "file": {
                "XmlRootLevelValues": [],
                "domain": "DOMAIN",
                "password": "PASSWORD",
                "type": "csv",
                "username": "USER",
                "url": "smb://FILESHARE:139/cyber_logs/"
            }
        },

 

Using XmlIgnore Values to Derive Field Names Automatically

The fieldnames can also be derived automatically by setting XmlIgnoreValues. In this case, XmlRootLevelValues need not be set.

For "*sv" files

TODO source example

Using XmlIgnore Values to Derive Field Names Automatically

The field names can also be derived automatically from the headers.

The field "XmlIgnoreValues" is used to identify the headers - the start of each line is compared to each of the strings in this array - if they match the line is ignored. This allows header lines to be ignored.In addition, the first line matching an ignore value field that consists of the more than 1 token-separated field will be used to generate the fieldnameselement in "XmlIgnoreValues", if it matches then that line is designated as a header and does not generate a document.

Furthermore, if the header line contains the right number of fields, then it is used to generate the field names used in the "csv" object.

Example:

If "XmlIgnoreValues": "#", and the first three lines are "#", "#header", and "#field1,field2,field3" then the processing will assume the 3 fields are field1, field2, and field3.

...