Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Table of Contents

 

Format

...

(IN PROGRESS)

The federated query source is a bit different to the others - it temporarily imports documents into the platform via an external API when a recognized query is performed via the API or UI.

There are 3 ways of bringing in the data:

  • URL requests
  • Specifying a python script (actually a Jython script) - the "importScript" should be the python code to execute, with the last evaluated expression being the value to pass back
  • Specifying an external script  - the importScript should be the command line, eg "dir/scriptname.sh args" (no "sh" in front of the script - a "#!" construct must be used inside the script)
    • (the same access rules apply as for the "External Script" extractor described here, ie "dir" is offset from "/opt/infinite-home/lib/extractor-scripts")

The rules for whether to generate a federated query source are as follows:

  • If there is a single entity query term (apart from "*", which is ignored), and its type is one of the elements of "entityTypes" (apart from the special case described below), then that query term is copied into $1 and the federated query is issues
  • If entityTypes contains elements in the form "/regex/TYPE" (ie starting with /), and the query term is a single text term (again apart from "*") that matches the regex, then the matching string is used as the term, together with the corresponding TYPE

The query term is copied into $1, which can be placed in any of the requests.*/importScript fields.

Once data has been obtained from an external source, it can be processed in one of 2 ways:

  • The "docConversionMap" can be used to generate entities:
    • The docConversionMap keys point to nested JSON fieldnames (":" used instead of "."), 
      • if a key starts with ":" then the JsonPath syntax is used.
      • The corresponding values are the entity types
      • "typeToDimensionMap" maps the types to dimensions (Who/What/Where/When)
  • A normal source object can be used with the federated query as the first element - a single document is passed into the pipeline, with the following full text:
    • The output of the script 
    • A new-line separated list of the outputs of the URL requests (which are also individually copied into a metadata array called "__FEDERATED_REPLIES__", if there are more than one)

There are 2 levels of caching:

  • The API response is cached for the period specified by "cacheTime_days"
  • The separate documents are cached indefinitely (though will be refreshed whenever the API response) - although this provides a second layer of caching, its primary purpose is to enable the documents to be stored in buckets and queues. 
    • Note that when the docs are refreshed, "updateId" is used to retain the original "_id", see Document JSON format.

Finally, note that there is a "testQueryJson" string field, which is just used from the "Test Source" UI/API function - it injects a fake query that is used to generate the API request.

Format

testQueryJson
Code Block
//URL endpoint
{
        "federatedQuery": {
            "cacheTime_days": 5,
            "docConversionMap": {"resolutions:ip_address": "ExternalIp"},
            "entityTypes": ["ExternalDomain", "/[a-z0-9_.-][.]com/ExternalDomain"],
            "requests": [
                {
                    "endPointUrl": "",
                    "urlParams": {
                        "apikey": "XXX",
                        "domain": "$1"
                    }
                },
                {
                    "endPointUrl": "",
                    "urlParams": {
                        "apikey": "XXX",
                        "domain": "$1"
                    }
                }
            ],
            "testQueryJson": "{'qt':[{'entity':'garyhart.com/externaldomain'}]}",
            "titlePrefix": "Virus Total Domain Lookup",
   ": "Virus Total Domain Lookup",
            "typeToDimensionMap": {"ExternalIp": "Who"}
        }
}
//OR
{
        "typeToDimensionMapfederatedQuery": {
			"ExternalIpimportScript": string,
			"Whoscriptlang"}
        }
    }],: string ("python" or "external")
			// no requests array, otherwise the same as above
//...
}

Example

Code Block
{
    "description": "Federated Query - Virustotal Domain",
    "extractType": "Federated",
    "federatedQueryCommunityIds": [
        "53ab42a2e4b04bcfe2de4387"
    ],
    "isPublic": true,
    "mediaType": "Record",
    "processingPipeline": [
        {
            "display": "Just contains a string in which to put the logstash configuration (minus the output, which is appended by Infinit.e)",
            "federatedQuery": {
                "bypassSimpleQueryParsing": false,
                "cacheTime_days": 5,
                "docConversionMap": {
                    "Webutation domain info:Safety score": "SafetyScore",
                    "Webutation domain info:Verdict": "SafetyRating",
                    "detected_communicating_samples:date": "Date",
                    "detected_communicating_samples:positives": "CleanURLScan",
                    "detected_communicating_samples:sha256": "Hash",
                    "detected_downloaded_samples:date": "Date",
                    "detected_downloaded_samples:positives": "MaliciousURLScan",
                    "detected_downloaded_samples:sha256": "Hash",
                    "resolutions:ip_address": "ExternalIp",
                    "resolutions:last_resolved": "ResolvedDate"
                },
                "entityTypes": [
                    "externaldomain",
                    "/.*[.][a-z]+/externaldomain"
                ],
                "requests": [
                    {
                        "endPointUrl": "https://www.virustotal.com/vtapi/v2/domain/report",
                        "urlParams": {
                            "apikey": "xxxxxxxxxxxxxxxx...",
                            "domain": "$1"
                        }
                    }
                ],
                "scriptlang": "none",
                "testQueryJson": "{'qt':[{'entity':'garyhart.com/externaldomain'}]}",
                "titlePrefix": "Virus Total Domain Lookup",
                "typeToDimensionMap": {
                    "CleanAVURLScan": "What",
                    "Date": "What",
                    "ExternalIp": "What",
                    "Hash": "What",
                    "MaliciousAVURLScan": "What",
                    "ResolvedDate": "What",
                    "SafetyRating": "What",
                    "SafetyScore": "What"
                }
            }
        }
    ],
    "tags": [
        "Federated",
        "Query",
        "Virustotal",
        "Domain"
    ],
    "title": "Federated Query - Virustotal Domain"
}

...