Overview
Table of Contents |
---|
Format
...
(IN PROGRESS)
The federated query source is a bit different to the others - it temporarily imports documents into the platform via an external API when a recognized query is performed via the API or UI.
There are 3 ways of bringing in the data:
- URL requests
- Specifying a python script (actually a Jython script) - the "importScript" should be the python code to execute, with the last evaluated expression being the value to pass back
- Specifying an external script - the importScript should be the command line, eg "dir/scriptname.sh args" (no "sh" in front of the script - a "#!" construct must be used inside the script)
- (the same access rules apply as for the "External Script" extractor described here, ie "dir" is offset from "/opt/infinite-home/lib/extractor-scripts")
The rules for whether to generate a federated query source are as follows:
- If there is a single entity query term (apart from "*", which is ignored), and its type is one of the elements of "entityTypes" (apart from the special case described below), then that query term is copied into $1 and the federated query is issues
- If entityTypes contains elements in the form "/regex/TYPE" (ie starting with /), and the query term is a single text term (again apart from "*") that matches the regex, then the matching string is used as the term, together with the corresponding TYPE
The query term is copied into $1, which can be placed in any of the requests.*/importScript fields.
Once data has been obtained from an external source, it can be processed in one of 2 ways:
- The "docConversionMap" can be used to generate entities:
- The docConversionMap keys point to nested JSON fieldnames (":" used instead of "."),
- if a key starts with ":" then the JsonPath syntax is used.
- The corresponding values are the entity types
- "typeToDimensionMap" maps the types to dimensions (Who/What/Where/When)
- The docConversionMap keys point to nested JSON fieldnames (":" used instead of "."),
- A normal source object can be used with the federated query as the first element - a single document is passed into the pipeline, with the following full text:
- The output of the script
- A new-line separated list of the outputs of the URL requests (which are also individually copied into a metadata array called "__FEDERATED_REPLIES__", if there are more than one)
There are 2 levels of caching:
- The API response is cached for the period specified by "cacheTime_days"
- The separate documents are cached indefinitely (though will be refreshed whenever the API response) - although this provides a second layer of caching, its primary purpose is to enable the documents to be stored in buckets and queues.
- Note that when the docs are refreshed, "updateId" is used to retain the original "_id", see Document JSON format.
Finally, note that there is a "testQueryJson" string field, which is just used from the "Test Source" UI/API function - it injects a fake query that is used to generate the API request.
Format
testQueryJson
Code Block |
---|
//URL endpoint { "federatedQuery": { "cacheTime_days": 5, "docConversionMap": {"resolutions:ip_address": "ExternalIp"}, "entityTypes": ["ExternalDomain", "/[a-z0-9_.-][.]com/ExternalDomain"], "requests": [ { "endPointUrl": "", "urlParams": { "apikey": "XXX", "domain": "$1" } }, { "endPointUrl": "", "urlParams": { "apikey": "XXX", "domain": "$1" } } ], "testQueryJson": "{'qt':[{'entity':'garyhart.com/externaldomain'}]}", "titlePrefix": "Virus Total Domain Lookup", ": "Virus Total Domain Lookup", "typeToDimensionMap": {"ExternalIp": "Who"} } } //OR { "typeToDimensionMapfederatedQuery": { "ExternalIpimportScript": string, "Whoscriptlang"} } }],: string ("python" or "external") // no requests array, otherwise the same as above //... } |
Example
Code Block |
---|
{ "description": "Federated Query - Virustotal Domain", "extractType": "Federated", "federatedQueryCommunityIds": [ "53ab42a2e4b04bcfe2de4387" ], "isPublic": true, "mediaType": "Record", "processingPipeline": [ { "display": "Just contains a string in which to put the logstash configuration (minus the output, which is appended by Infinit.e)", "federatedQuery": { "bypassSimpleQueryParsing": false, "cacheTime_days": 5, "docConversionMap": { "Webutation domain info:Safety score": "SafetyScore", "Webutation domain info:Verdict": "SafetyRating", "detected_communicating_samples:date": "Date", "detected_communicating_samples:positives": "CleanURLScan", "detected_communicating_samples:sha256": "Hash", "detected_downloaded_samples:date": "Date", "detected_downloaded_samples:positives": "MaliciousURLScan", "detected_downloaded_samples:sha256": "Hash", "resolutions:ip_address": "ExternalIp", "resolutions:last_resolved": "ResolvedDate" }, "entityTypes": [ "externaldomain", "/.*[.][a-z]+/externaldomain" ], "requests": [ { "endPointUrl": "https://www.virustotal.com/vtapi/v2/domain/report", "urlParams": { "apikey": "xxxxxxxxxxxxxxxx...", "domain": "$1" } } ], "scriptlang": "none", "testQueryJson": "{'qt':[{'entity':'garyhart.com/externaldomain'}]}", "titlePrefix": "Virus Total Domain Lookup", "typeToDimensionMap": { "CleanAVURLScan": "What", "Date": "What", "ExternalIp": "What", "Hash": "What", "MaliciousAVURLScan": "What", "ResolvedDate": "What", "SafetyRating": "What", "SafetyScore": "What" } } } ], "tags": [ "Federated", "Query", "Virustotal", "Domain" ], "title": "Federated Query - Virustotal Domain" } |
...