Specifying Associations
What is An Association?
Associations can be "something that happens or is regarded as happening; an occurrence, especially one of some importance", "the outcome, issue, or result of anything", or "something that occurs in a certain place during a particular interval of time" (Definitions found here: http://dictionary.reference.com/browse/event). Within Infinit.e events are typically a combination of entities assembled in the form of Noun - Verb - Noun, e.g. "a car crashed into a building", "the plane flew to San Diego". In addition to the Noun - Verb - Noun form events can include geographic information (i.e., where an event happened) as well as a start and/or end time for an event. The following section describes how to specify the extraction of events from a data source using the Structured Analysis Harvester.
Note that at least one of the entity1/entity2/geo_index fields must point to an entity in the document (either extracted using Natural Language Processing or using the "entities" block of the Structured Analysis object). The different ways in which this can be achieved are described below.
Basic Association Specification
The following code demonstrates how to specify a basic association (Note: The sample event specification and sample event output below is extracted from a sample MySql Database Source):
{ //... "associations" : [ { "entity1" : "$metadata.offense,$metadata.method", "verb" : "reported", "verb_category" : "crime", "time_start" : "$metadata.reportdatetime", "geo_index" : "Location", "geotag" : { "lat" : "$metadata.latitude", "lon" : "$metadata.longitude"} }, ], //... }
In the basic example above the following fields have been specified:
- entity1
A free form text field containing information about the event "subject", i.e. an entity's disambiguous name. - entity1_index
If present this is the "index" field of the entity matching the entity1 disambiguous name above. - verb
A free form text field describing the event "verb" - verb_category
Also a free form text field describing the event "verb", but intended to group related verbs together (eg "travel" for verbs: "flew", "drove") - geo_index
If the event geotag maps into an entity from the parent document then this field is the "index" of that entity. The "geo_index" can also be directly specified as an entity index or an entity type (in "iterateOver" cases), and the geotag is then derived. - geotag
- lat
String containing a floating point representation of latitude - lon
String containing a floating point representation of longitude
- lat
The result of the association specification above can be seen in the sample output below:
{ //... "associations" : [ { "entity1" : "robbery gun", "entity1_index" : "robbery gun/criminalactivity", "verb" : "reported", "verb_category" : "crime", "geotag" : { "lat" : "38.9051666534795", "lon" : "-77.0121735726172" }, "geo_index" : "1100 b/o 1st st nw washington dc/place", "assoc_type" : "Event" } ], //... }
In the sample output above please note that the Infinit.e harvester automatically generates the following fields as appropriate:
- event_type
"Event", "Fact", "Summary"
The "assoc_type" field sub-categorizes the "event" object into one of three types, "Event", "Fact", or "Summary". Examples provided below should make the distinction clearer, but it can be simply described as follows:- "Event": link multiple entities (via "entity1_index", "entity2_index", "geo_index") and represent a transient activity (eg travel)
- "Fact": link multiple entities like "Events" but represent (transient or permanent) relationships (eg being president)
- "Summary": generally link 1 entity to a free text (eg a quotation: "Obama says...").
Specifying Multiplicative Associations
Multiplicative association are associations that are created by "multiplying" a combination of entities, locations, and times together to determine the number of associations to extract from the source data. For example, in the following sample document describing a terrorist attack one terrorist (perpetrator) attacked two different types of victims (police officers and military personnel) in Sri Lanka. The association specification uses the multiplicative format to create events using the following math to determine the total number of associations: Entity1 (PersonPerpetrator) * Entity2 (VictimType) * Geo_index (Location) = Total Number of Associations.
The Structured Analysis Harvester supports the creation of events using the following specification format for multiplicative events:
{ //... "associations" : [ { "iterateOver" : "entity1/entity2/geo_index", "verb" : "attacked", "verb_category" : "assault/attack", "entity1" : "PersonPerpetrator", "entity2" : "VictimType", "geo_index" : "Location", "time_start" : "$SCRIPT( return _doc.metadata.incidentdate[0]; )" }, ], //... }
Multiplicative Associations are specified by specifying which entity types to use to populate the entity1, entity2, geo_index, time_start, and time_end fields of the events created. The iterateOver field is used to specify the order in which the entity types to use are multiplied to determine the total number of associations to create.
- iterateOver
The iterateOver field specifies the order in which the Structured Analysis Harvester multiplies each field in order to create the right number and type of events. Each of the entity fields to use are separated by the '/' character and specifies the entity type used to populate the matching field in the event object.
Association fields generated from the entity loop are placed in "_iterator". For example, for "iterateOver": "entity1/entity2/geo_index", an _iterator object with the following fields is available in the Javascript: "_iterator.entity1_index", "_iterator.entity2_index", "_iterator.geo_index".
These fields can be usefully used together with "creationCriteriaScript" scriptlets to filter out unwanted associations, eg when looping over entity1 and entity2 with the same entity type, the following script would ensure the association didn't involve the same entity:
"creationCriteriaScript": "$SCRIPT( return _iterator.entity1_index != _iterator.entity2_index; )", "iterateOver": "entity1/entity2", "entity1": "EmailAddress", "entity2": "EmailAddress", //etc
The creationCriteriaScript runs before the association is generated (so can be safely used to remove items that would return errors).
In the example source below there are four entities that are being shown: one location entity, two victim entities, and one perpetrator entity. These visible entities are the entities referenced in the example specification above.
{ //... "entities" : [ { "actual_name" : "Batticaloa,North Eastern Province,Sri Lanka", "dimension" : "Where", "disambiguated_name" : "Batticaloa,North Eastern Province,Sri Lanka", "doccount" : NumberLong(18), "frequency" : 1, "index" : "batticaloa,north eastern province,sri lanka/location", "geotag" : { "lat" : "7.7166667", "lon" : "81.7" }, "ontology_type": "countrysubsidiary", "totalfrequency" : NumberLong(18), "type" : "Location" }, { "actual_name" : "Targeted, Police, Adult from Sri Lanka", "dimension" : "Who", "disambiguated_name" : "Targeted, Police, Adult from Sri Lanka", "doccount" : NumberLong(47), "frequency" : 3, "index" : "targeted, police, adult from sri lanka/victimtype", "totalfrequency" : NumberLong(161), "type" : "VictimType" }, { "actual_name" : "Targeted, Military, Adult, Combatant from Sri Lanka", "dimension" : "Who", "disambiguated_name" : "Targeted, Military, Adult, Combatant from Sri Lanka", "doccount" : NumberLong(20), "frequency" : 1, "index" : "targeted, military, adult, combatant from sri lanka/victimtype", "totalfrequency" : NumberLong(147), "type" : "VictimType" }, { "actual_name" : "Secular/Political/Anarchist from Sri Lanka", "dimension" : "Who", "disambiguated_name" : "Secular/Political/Anarchist from Sri Lanka", "doccount" : NumberLong(200), "frequency" : 1, "index" : "secular/political/anarchist from sri lanka/personperpetrator", "totalfrequency" : NumberLong(200), "type" : "PersonPerpetrator" }, //... ], // ... }
The following Multiplicative Event Output example shows how the Structured Analysis Harvester would generate two events from the source data and specification show above:
{ //... "associations" : [ { "entity1" : "secular/political/anarchist from sri lanka", "entity1_index" : "secular/political/anarchist from sri lanka/personperpetrator", "verb" : "attacked", "verb_category" : "assault/attack", "entity2" : "targeted, police, adult from sri lanka", "entity2_index" : "targeted, police, adult from sri lanka/victimtype", "time_start" : "09/07/2005", "geotag" : { "lat" : "7.7166667", "lon" : "81.7" }, "geo_index" : "batticaloa,north eastern province,sri lanka/location", "assoc_type" : "Event" }, { "entity1" : "secular/political/anarchist from sri lanka", "entity1_index" : "secular/political/anarchist from sri lanka/personperpetrator", "verb" : "attacked", "verb_category" : "assault/attack", "entity2" : "targeted, military, adult, combatant from sri lanka", "entity2_index" : "targeted, military, adult, combatant from sri lanka/victimtype", "time_start" : "09/07/2005", "geotag" : { "lat" : "7.7166667", "lon" : "81.7" }, "geo_index" : "batticaloa,north eastern province,sri lanka/location", "assoc_type" : "Event" } ], //... }
- If the "iterateOver" field contains neither "," or "/" ("," is for additive associations, see below) then it is treated as an iterator over a metadata field, just as described under Specifying Entities, section "Create Entities from Arrays of Items".
- To iterate just over a single entity type, use "dummy", eg "entity1/dummy" or "entity2,dummy" (The '/' vs ',' are equivalent in this case).
Specifying Additive Associations
Additive associations cover the less common case where (eg) 2 entity types have the same number of elements and are ordered "in lock step". For example:
"entities": [ { "index": "alex/person", ... }, { "index": "craig/person", ... }, { "index": "baltimore/city", ... }, { "index": "washington dc/city", ...}, ... ]
In this case the additive association specification:
{ "iterateOver": "entity1,entity2", // note "," instead of "/" "entity1": "Person", "entity2": "City", "verb_category": "lives in", ... }
Would generate the 2 associations "alex/person lives in baltimore/city" and "craig/person lives in washintgon dc/city".