Knowledge - Query - Output options

Overview

The "output" object allows the configuration of 3 different logical functions:

  • The output format: JSON or XML (functionally equivalent), RSS (behaves differently in terms of both functionality and authentication), and KML is coming soon.
  • The document format: how many documents to return from a query
  • Aggregations: these are often the most important outputs from Infinit.e - aggregations over all matching documents of various fields. Understanding how to configure them and the format of the resulting objects is important to taking most advantage of the platform. The aggregations section includes the powerful "moments" aggregation.
  • Filtering: this allows callers to restrict retrieved documents, entities and associations based on either entity type or association verb category.

The sections below show the configuration of each of these different functions. They can also be combined into a single object, eg:

In this section:


 

Output Format

Output options
{
	"output": {
		"format": string,
		"docs": { ... },
		"aggregation": {...},
		"filter": {...}
	}
}

Field Guide

Output Format

Output format
{
	"output": {
		"format": string, // "json" (default), "xml", or "rss"
	}
}

RSS

Integrating REST authentication with RSS readers is a known problem, and Community Edition currently provides a few different options:

If the query is made from a browser that is already logged in (eg. the "RSS" button from the GUI) or with a cookie obtained from login then it works as normal.

(The cookies lifetime is only  30 minutes (server-side configurable), so this is not a viable long term option eg for use in RSS readers.)

Clear text or encrypted password:

It is possible to use "user=" and "pword=" URL parameters. The password can either be clear text (not recommended), or SHA-256     and Base-64 encoded (this site can be used for testing).

  • This enables users to generate arbitrary queries and store them in RSS readers, at the expense of showing a password that can at best be used by others in any other REST function.
  • (A future version of the tool will force these queries to be over SSL, to mitigate the risk this presents somewhat.)

Query-Specific Key:

A key for a specific query can be generated from the GUI, and then this query can be used without any authentication at all.

  • This final authentication bypass is terrible for a few reasons (eg the key is only protected by "security through obscurity"!), and it is only a temporary solution.
  • (A future version will provide a "license key" that will be usable only for RSS, to replace this.)

When requesting RSS via a key, you will need to supply the user's community ID as the first ID in the communityIds json call e.g. {"communityIds":["USER_ID","ALL_OTHER_COMMS"]}

This is to ensure a user has access to the communities you supply since the API call is being made unauthenticated.

For more information, see API Reference

Examples:

Examples of using RSS with various authentication methods:

Note finally that RSS (unlike XML and JSON) only provides the document URLs, none of the metadata.

Document Formats

Document formats
{
	"output": {
		"docs": {
			// Whether to return documents/how many:
			"enable": boolean, // (defaults to true)
			"numReturn": integer, // (defaults to 100, maximum is 10K - not advised unless all scoring is turned off)
			"skip": integer, // (defaults to 0)
			// Alternative/complement to documents:
			"eventsTimeline": boolean, // (defaults to false)
			"numEventsTimelineReturn": integer, // (defaults to 1000)
			// Which sub-objects to return per document:
			"ents": boolean, // (all of these default to true)
			"geo": boolean,
			"events": boolean,
			"facts": boolean,
			"summaries": boolean,
			"metadata": boolean
		}
	}
}

Controlling the Number of Documents Returned

As described in the section on scoring, documents are sorted according to a scoring algorithm, and are retrieved in order. The "numReturn" field dictates how many are returned to the user, and the "skip" field allows a primitive concept of paging (eg "?output.docs.numReturn=10&skip=0","?output.docs.numReturn=10&skip=10","?output.docs.numReturn=10&skip=20", etc).

There are two reasons to limit the number of documents (we think 100 works quite well within general visualization GUIs):

  •  For performance reasons
  • The point of the Infinit.e tool is to extract "knowledge" for a corpus of documents, therefore cluttering the display with a large number of documents could be viewed as counter productive.

If "&output.docs.enable=false" then no documents are returned.

Controlling the Format of Documents Returned

The document format is described here. As can be seen there, documents have a number of sub-objects: entities, events (which are then sub-divided into "Events", "Facts", and "Summaries"), and source-specific metadata. (See Source Pipeline Documentation and the documentation on MetadataEntities, and Associations for more information. 

The "ents", geo", "events", "facts", "summaries", and "metadata" fields are simply booleans that control whether these sub-objects are included. The main reason for not including them is just to avoid cluttering up and slowing down requests where they are not needed.

Note that "geo" controls whether entities with (lat,long)s are included - eg for geospatial apps it maybe that most entities are not of interest, but geotagged ones are: in this case the pairing "&ents=false&geo=true" would be used.

 

Events Timeline

The most significant document parameter is "output.docs.eventsTimeline". This generates a new output array, consisting of the "event" sub-objects (for "Events", "Facts", and "Summaries").

Events are taken from the top "score.numAnalyze" matching documents - the "top" specified documents are returned, where "top" is based on the Pythagorean sum of the documents containing each event. Note that "score.numAnalyze" currently controls two other important output components:

The fields "output.docs.events", "output.docs.facts", "output.docs.summaries" control which of "Events", "Facts" and "Summaries" are included in the timeline.

Example query/output (link to JSON format specification):

Geo-spatial aggregation example
//curl -XGET 'http://infinite.ikanow.com/api/knowledge/query/4c927585d591d31d7b37097a?qt[0].etext="*"&input.tags="topic:technology"&output.docs.enable=false&output.docs.eventsTimeline=true'
{
    response: { ... }
    stats: { ... }
    eventsTimeline: [
        {
            entity1: "ev solar carport"
            entity1_index: "ev solar carport/facility"
            verb: "deliver"
            verb_category: "generic relations"
            entity2: "125 mw hours"
            event_type: "Summary"
            time_start: "2011-05-26"
	    assoc_sig: 126.6919059017276,
            doccount: 1
        },
        {
            entity1: "aol advertising.com group"
            entity1_index: "aol advertising.com group/company"
            verb: "include"
            verb_category: "generic relations"
            entity2: "advertising.com"
            entity2_index: "advertising.com, inc./company"
            event_type: "Fact"
            time_start: "2011-06-01"
	    assoc_sig: 126.6919059017276,
            doccount: 4
            time_end: "2011-06-27"
        },
        //etc
    ]
}

"Events" and "Summaries" with the same time range (ie "time_start", "time_end" pair) are aggregated, with "doccount" used to store the sum. For "Facts", the "time_start" and "time_end" are set to the newest and oldest dates in which the "Fact" occurs (ie this may give some useful time range over which it is being discussed), with "doccount" counting all instances of the "Fact" within that time range.

Aggregation Formats

Format

Aggregation formats
{
	"output": {
		"aggregation": {
			// Geo-spatial/temporal aggregations (all of these default to 0):
			"geoNumReturn": integer,
			"timesInterval": string,
			// Entities, events, facts:
			"entsNumReturn": integer,
			"eventsNumReturn": integer,
			"factsNumReturn": integer,
			// "Moments", temporal aggregation for entities:
			moments: { ... }
			// Source information:
			"sources": integer,
			"sourceMetadata": integer // (includes both tags and types)
 // Raw ElasticSearch "facets":
			"raw": string // (see below)
		}
	}
}

The configuration of aggregation outputs is relatively simple, but this section also covers the different output formats:

Note that a (near-) future release will provide a more generic and powerful aggregation interface, allowing various document, entity, event, and metadata properties to be aggregated over time, space, and frequency.

 

Geo-spatial Aggregation

Geo-spatial aggregation configuration
{
	"output": {
		"aggregation": {
			"geoNumReturn": integer,
		}
	}
}

All of the "[type]NumReturn" fields simply configure the number of entries returned (in the case of "geo", (lat,long) pairs), in order of frequency in the query-matching dataset.

The output format and an example is show below:

Geo-spatial aggregation output format
{
	//...
	"geo": [
		{
			"type": string, // the ontology type - see below
			"lat": number,
			"lon": number,
			"count": integer // the number of occurrences
		}
	],
	"maxGeoCount": number
	//...
}
Geo-spatial aggregation example
//curl -XGET 'http://infinite.ikanow.com/api/knowledge/query/4c927585d591d31d7b37097a?qt[0].etext=%22*%22&input.tags=%22topic:technology%22&output.aggregation.geoNumReturn=100&output.docs.enable=false'
{
    response: {
        action: "Query"
        success: true
        message: "((*))"
        time: 296
    },
    stats: {
        found: 53314
        start: 0
        maxScore: 0
        avgScore: 0
    },
    geo: [
    {
	type: "city"
        lat: 37.77499996125698
        lon: -122.4183003231883
        count: 1494
    },
    {
	type: "point"
        lat: 47.60639989748597
        lon: -122.3308002948761
        count: 442
    },
    {
	type: "geographicalregion"
        lat: 37.441899944096804
        lon: -122.14190002530813
        count: 206
    },
    //(etc)
    ],
    maxGeoCount: 9910
}

The "maxGeoCount" field in the top-level response is simply the highest count that occurs in the list (which is ordered by the underlying geohash used to store the lat/long). This can be used to calculate scaling factors without first having to traverse the return array.

The "type" field (ontological type) is discussed under the Geo JSON format.

A typical use of the "geo" aggregation is to show heatmaps: in this case a "geoNumReturn" value of at least 1000 is recommended for large datasets.

Temporal Aggregation

Temporal aggregation configuration
{
	"output": {
		"aggregation": {
			"timesInterval": string,
		}
	}
}

Temporal aggregation has a different configuration parameter to the others. Instead of specifying a number of entries to return, a string specifies the interval over which a document count is to be summed. This is in the standard format: "N[hdwmy]" ie an integer followed by h (hour), d (day), w (week), m (month), y (year).

Note that if "m" for month is the interval unit, then the aggregation is always performed over 1 month intervals, regardless of the "N"

 

The output format is very simple:

Temporal aggregation output format
{
	//...
	"times": [
		{
			"time": long,
			"count": integer // the number of occurrences
		}
	],
	"timeInterval": long,
	//...
}

The "time" field in the "times" array is the start time of the interval in "ms" Unix time (milliseconds after 1970). The top-level "timeInterval" is the duration of that interval in ms, ie each interval can be expressed as ["times.time", "times.time"+"timeInterval"].

Even though the document counts are sorted by time rather than by "count", unlike geo-spatial aggregation no maximum count is provided. This is part oversight, part because it is not so (performance) critical to know the scaling factors in advance, but it will probably be corrected in a future release.

Temporal aggregation example
//curl -XGET 'http://infinite.ikanow.com/api/knowledge/query/4c927585d591d31d7b37097a?qt[0].etext=%22*%22&input.tags=%22topic:technology%22&output.aggregation.timesInterval="1w"&output.docs.enable=false'
{
    response: { ... },
    stats: { ... },
    times: [
        {
            time: 977702400000
            count: 2
        },
        {
            time: 993427200000
            count: 1
        },
        {
            time: 1041206400000
            count: 3
        },
        //etc
    ],
    timeInterval: 604800000
}

See also the more granular per-entity temporal aggregation available using "moments".

Entity Aggregation

Entity aggregation preserves the format of the "entities" sub-objects of the document, but across all documents in the query-matching dataset.

  • (In fact due to implementation limitations, currently only the top "score.numAnalyze" eg 1000 documents are used to generate the entity aggregations)
Entity aggregation configuration
{
	"output": {
		"aggregation": {
			"entsNumReturn": integer,
		}
	}
}

It is worth noting that entities are returning in descending significance order (the other sorted aggregation types such as "geo" and "events" are ranked by frequency and are actually generated from the entire matching dataset, rather than a subset). A future release will try to standardize use of frequency vs significance, and also remove the use of subsets where possible.

The entity output format is identical to the entity sub-object described here, except that the per-document fields ("significance" and "frequency") are replaced with the maximum per-document values in the matching sub-set, and some other fields ("actual_name", "relevance", "sentiment" are not present).

Entity aggregation example
//curl -XGET 'http://infinite.ikanow.com/api/knowledge/query/4c927585d591d31d7b37097a?qt[0].etext=%22*%22&input.tags=%22topic:technology%22&output.aggregation.entsNumReturn=10&output.docs.enable=false'
{
    response: { ... },
    stats: { ... },
    entities: [
        {
            dimension: "Who"
            disambiguated_name: "LulzSec"
            doccount: 35
            frequency: 26
            index: "lulzsec/organization"
            totalfrequency: 348
            type: "Organization"
            significance: 6.874384562114902
            datasetSignificance: 6.001782525850459
            queryCoverage: 0.03835649052289408
            averageFreq: 0.054
        },
        {
            dimension: "Who"
            disambiguated_name: "Oracle Corporation"
            doccount: 827
            frequency: 20
            index: "oracle corporation/company"
            linkdata: [
                http://d.opencalais.com/er/company/ralg-tr1r/eab9bfaa-47f1-368a-a9b7-a87bb345cf30
            ]
            totalfrequency: 2541
            type: "Company"
            significance: 9.882910950748887
            datasetSignificance: 5.5199530144758615
            queryCoverage: 0.8277825954323541
            averageFreq: 0.076
        },
        //etc
    ]
}

Good "entsNumReturn" values vary with application. For a document set that will not be filtered, 100 is a good value. For "recommendation" displays (eg "Other entities you may be interested in"), as few as 5-10 works fine. For larger datasets where the user will filter down from the initial return set then 1000+ is recommended.

See also the more granular per-entity temporal aggregation available using "moments".

Event and Fact Aggregation

As described under their format specification, events are split into 3 categories:

  • "Events": link multiple entities (via "entity1_index", "entity2_index", "geo_index") and represent a transient activity (eg travel)
  • "Facts": link multiple entities like "Events" but represent (transient or permanent) relationships (eg being president)
  • "Summaries": generally link 1 entity to a free text (eg a quotation: "Obama says...").

Summaries cannot currently be aggregated (except manually or via the "output.docs.eventsTimeline" function), because of (surmountable but non-trivial) implementation issues, combined with its perceived low priority. It is unclear whether it will get added in the future.

The configuration format is straightforward. As described under entities, events and facts are ranked by frequency not significance (but this is likely to be an option in the future).

Event/fact aggregation configuration
{
	"output": {
		"aggregation": {
			"eventsNumReturn": integer,
			"factsNumReturn": integer,
		}
	}
}

Similar to entities, the event/fact output format is essentially the same as the Documents and their sub-objects (entities, associations, user metadata, aggregations)#Eventdocument sub-object format, although fewer fields are populated:

Event/fact aggregation output format
{
	//...
	"events": [
		{
			"event_type": "Event",
			"entity1_index": string,
			"verb_category": string,
			"entity2_index": string,
			"geo_index": string,
			"assoc_sig": number, // A significance score for the association object (see below)
			"entity1_sig": number, // A significance score for entity1, if present
			"entity2_sig": number, // A significance score for entity2, if present
			"geo_sig": number, // A significance score for the geo, if present
			"doccount": integer // the number of occurrences
		}
	],
	"facts": [
		{
			"event_type": "Fact",
			"entity1_index": string,
			"verb_category": string,
			"entity2_index": string,
			"geo_index": string,
			"assoc_sig": number, // A significance score for the association object (see below)
			"entity1_sig": number, // A significance score for entity1, if present
			"entity2_sig": number, // A significance score for entity2, if present
			"geo_sig": number, // A significance score for the geo, if present
			"doccount": integer // the number of occurrences
		}
	],
	//...
}
Event/fact aggregation example
//curl -XGET 'http://infinite.ikanow.com/api/knowledge/query/4c927585d591d31d7b37097a?qt[0].etext=%22*%22&input.tags=%22topic:technology%22&output.aggregation.eventsNumReturn=2&output.aggregation.factsNumReturn=2&output.docs.enable=false'
{
    response: { ... },
    stats: { ... },
    events: [
        {
            event_type: "Event"
            entity1_index: "microsoft corporation/company"
            verb_category: "acquisition"
            entity2_index: "skype technologies s.a./company"
	    assoc_sig: 13.454
            entity1_sig: 12.5
            entity2_sig: 14.5
            doccount: 69
        },
        {
            event_type: "Event"
            entity1_index: "google inc./company"
            verb_category: "product release"
            entity2_index: "google+/product"
	    assoc_sig: 35.123324
            entity1_sig: 10.5123
            entity2_sig: 94.235435
            doccount: 14
        }
    ],
    facts: [
        {
            event_type: "Fact"
            entity1_index: "google inc./company"
            verb_category: "company product"
            entity2_index: "android/product"
	    assoc_sig: 23.34546
            entity1_sig: 10.5123
            entity2_sig: 54.235576
            doccount: 244
        },
        {
            event_type: "Fact"
            entity1_index: "tom XXX/person"
            verb_category: "person email address"
            entity2_index: "tXXX5@bloomberg.net/emailaddress"
	    assoc_sig: 43.234324
            entity1_sig: 43.234324
            entity2_sig: 0
            doccount: 72
        }
    ]
}

Note that "time_start" and "time_end" are aggregated out of the object, ie all time information is lost. To aggregate events over time, use "output.docs.eventsTimeline". It is likely at some point that the two formats will be combined somehow. At present, "output.aggregation.eventsNumReturn" and "output.aggregation.factsNumReturn" is best used with "link analysis" style applications, and "output.docs.eventsTimeline" is best used for timeline style applications.

 

Moments: Per Entity Temporal Aggregation

The "moments" function allows the entity aggregation to be combined with the "times" aggregation, generating a list of time periods in which named entities were mentioned, together with counts of the mentions for each time period.

Coming versions of the platform will enhance this capability further, eg providing aggregated sentiment for the named entities.

Moments: per entity temporal aggregation
 {
//...
	"output": {
	//...
		"aggregation": {
		//...
			"moments": {
				"timesInterval": string, // the time period over which the values are aggregated - same format as "output.aggregation.timesInterval"
				"geoNumReturn": integer, // (ALPHA) For each time interval, a list of geo buckets in the same format as "response.geo" (including maxGeoCount)
				"entityList": [ string ] // (BETA) A list of entity indexes, eg "barack obama/person"
			}
		//...
		}
	//...
	}
//...
}

Note that if no "output.aggregation.moments.timesInterval" is set, then the time will be taken from "output.aggregation.timesInterval" if available, and defaulted to 1 month otherwise.

Note that the entityList respects aliases.

Example moments request/reply
REQUEST (fragment)
{
	"moments": {
	"timesInterval": "1m", 
	"geoNumReturn": 2,
	"entityList": [ "barack obama/person", "mitt romney/person" ] 
}
}
REPLY (fragment)
{
	"moments": {
		"times": [
		{
           "time": 1346457600000,
           "count": 1,
			"maxGeoCount": 2936,
			"geo": [
			{
				"lat": 40.71416669525206,
				"lon": -74.00638880208135,
				"count": 2936,
				"type": "city"
			}
			]
		}
		],
	    "barack obama/person": [
        {
           "time": 1346457600000,
           "count": 1
        },
        {
           "time": 1351728000000,
           "count": 2
        },
        {
           "time": 1354320000000,
           "count": 1
        },
        {
           "time": 1356998400000,
           "count": 9
        }
 		],
 		"mitt romney/person": [
        {
           "time": 1346457600000,
           "count": 1
        },
        {
           "time": 1351728000000,
           "count": 1
        }
        ]
    }
}

Sources and Source Metadata Aggregation

It can often be useful to understand what sources/source categories documents are being returned from.  Community Edition allows the following aggregations:

  • Individual sources (using the "sourceKey" field of the document object, ie the "key" field of the "source" object)
  • Source types (using the "mediaType" field of the document object, ie the "mediaType" field of the "source" object)
  • Source tags (using the "tags" field of the document object, ie the "tags" field of the "source" object)
    • There is currently no concept of "per document" tags (eg auto-generated from the document content), though there may be in the future.
    • Note that the Infinit.e "system collection" contains two "top level" tags, "topic:<tag>" and "industry:<tag>" (its general, or "content" tags tend to be quite low level and difficult to use)

The first of these is configured by "output.aggregation.sources", the second two by "output.aggregation.sourceMetadata":

Source/source metadata aggregation configuration
{
	"output": {
		"aggregation": {
			"sources": integer,
			"sourceMetadata": integer,
		}
	}
}

The output format is very simple: arrays "sources", "sourceMetaTags", and "sourceMetaTypes", with fields "term" (string) and "count" (integer).

Examples:

Event/fact aggregation example
//curl -XGET 'http://infinite.ikanow.com/api/knowledge/query/4c927585d591d31d7b37097a?qt[0].etext=%22*%22&input.tags=%22topic:technology%22&output.aggregation.sources=5&output.aggregation.sourceMetadata=5&output.docs.enable=false'
{
    "response": {
        "action": "Query",
        "success": true,
        "message": "((*))",
        "time": 140
    },
    "stats": {
        "found": 53560,
        "start": 0,
        "maxScore": 0,
        "avgScore": 0
    },
    "sources": [
        {
            "term": "feed:..origin.feeds.pheedo.com.bw.technology_news-rss",
            "count": 11182
        },
        {
            "term": "http.gizmodo.com.index.xml",
            "count": 3102
        },
        {
            "term": "http.www.reddit.com.r.technology..rss",
            "count": 2391
        },
        {
            "term": "feed:..origin.feeds.pheedo.com.bw.energy_news-rss",
            "count": 2390
        },
        {
            "term": "http.www.engadget.com.rss.xml",
            "count": 1687
        }
    ],
    "sourceMetaTags": [
        {
            "term": "topic:technology",
            "count": 53560
        },
        {
            "term": "news",
            "count": 44708
        },
        {
            "term": "industry:technology",
            "count": 38355
        },
        {
            "term": "technology",
            "count": 30871
        },
        {
            "term": "industry:all",
            "count": 15205
        }
    ],
    "sourceMetaTypes": [
        {
            "term": "News",
            "count": 53281
        },
        {
            "term": "Video",
            "count": 279
        }
    ]
}

Raw Access to ElasticSearch "Facets"

In the same way that it is possible for queries, there is the option simply to pass an arbitrary "facet" (ie aggregation) through to ElasticSearch, using its raw API. Like for queries, this functionality should be considered an absolute last resort.

Unlike for queries, where the raw ElasticSearch query is specified as a JSON object, raw facets are specified as a string conversion of the JSON object (this may change in a future release), example:

Raw aggregation configuration example
{
	"output": {
		"aggregation": {
			"raw": "{\"sources\":{\"terms\":{\"field\":\"sourceKey\",\"size\":5}}}"
		}
	}
}

The above example is functionally equivalent to specifying "&output.aggregation.sources=5", except that the output would be in the array "facets.sources" instead of the top-level sources, eg:

Raw aggregation configuration output example
{
    "response": {
        "action": "Query",
        "success": true,
        "message": "((*))",
        "time": 140
    },
    "stats": {
        "found": 53560,
        "start": 0,
        "maxScore": 0,
        "avgScore": 0
    },
    "facets": {
        "sources": [
            {
                "term": "feed:..origin.feeds.pheedo.com.bw.technology_news-rss",
                "count": 11182
            },
            {
                "term": "http.gizmodo.com.index.xml",
                "count": 3102
            },
            {
                "term": "http.www.reddit.com.r.technology..rss",
                "count": 2391
            },
            {
                "term": "feed:..origin.feeds.pheedo.com.bw.energy_news-rss",
                "count": 2390
            },
            {
                "term": "http.www.engadget.com.rss.xml",
                "count": 1687
            }
        ],
        // +Any other "raw" facets specified
    }
}

Note finally that, like for queries, specifying any facets overrides any Infinit.e aggregations (except currently for entities, though this may change).

 

Filtering

Filter format
{
   "output": {
	"filter": {
		"entityTypes": [ string ], // A list of (case sensitive) entity types - if specified non-matching entities and associations and documents will be discarded
		"assocVerbs": [ string ] // A list of (case sensitive) verb categories - if specified non-matching associations and documents will be discarded
	}
   }
}

Either of the above filters can be made "negative" by inserting a "-" in front of the first entry in the array. Negative filtering simply removes all entities or associations that match the filter from the document (and also their score). Note that queries can still match on negatively filter entities and associations.

Examples:

Filter format - examples
//
// Twitter example: only pull back hashtags and twitter handles from tweets (and discard documents and associations not containing either)
//
{
   "output": {
	"filter": {
		"entityTypes": [ "HashTag", "TwitterHandle" ]
	}
   }
}
//
// Negative twitter example: remove all keywords and locations extracted for the tweet
//
{
   "output": {
	"filter": {
		"entityTypes": [ "-Keyword", "Location" ]
	}
   }
}



//
// Twitter example: pull back all entities, but only tweets that are retweets (and only retweet associations)
//
{
   "output": {
	"filter": {
		"assocVerbs": [ "retweets" ]
	}
   }
}
//
// Twitter example: this will discard all associations (because "rewteets" are always associations between TwitterHandle types)
//
{
   "output": {
	"filter": {
		"entityTypes": [ "Keyword" ],
		"assocVerbs": [ "retweets" ]
	}
   }
}
// 
// Business acquisition example
//
{
   "output": {
	"filter": {
		"entityTypes": [ "Company", "Organization" ],
		"assocVerbs": [ "acquires" ]
	}
   }
}

There is one implementation issue that is noteworthy: the association verb category is stored in such a way that subsets of the phrase will match. For example, "generic" or "relations" will match on "generic relations", and "generic relations" will match on "generic relations (special case)".