Association JSON format
{ "assoc_type": string, // "Event", "Fact", "Summary" - see below // Subject-verb-object "entity1": string, // A free form text field containing information about the association "subject" "entity1_index": string, // 0-1, if present is the "index" field from one of the entities in the parent document "verb": string, // A free form text field describing the association "verb" "verb_category": string, // Also a free form text field describing the association "verb", but intended to group related verbs together (eg "travel" for verbs: "flew", "drove") "entity2": string, // A free form text field containing information about the association "object" "entity2_index": string, // 0-1, if present is the "index" field from one of the entities in the parent document // Temporal "time_start": string, // 0-1, the start time, in ISO format (yyyy-MM-dd['T'HH:mm:ss]) "time_end": string, // 0-1, the end time - if there is no start time, the association is considered to be "instantaneous" (at least to within a day) // Geo-spatial "geo_index": string, // If the association geotag maps into an entity from the parent document then this field is the "index" of that entity "geotag": { // 0-1, only if the association has been geotagged "lat": number, "lon": number, }, // Sentiment: "sentiment": number, // The sentiment from entity1_index toward entity2 (and optionally entity2_index) // Scoring "assoc_sig": number, // A significance score for the association object (see below) "doccount": number, // The number of documents containing the association, aggregations only "entity1_sig": number, // If "entity1_index" is populated, the significance of entity1 "entity2_sig": number, // If "entity2_index" is populated, the significance of entity2 "geo_sig": number // If "geo_index" is populated, the significance of that entity }
Field Guide
assoc_type
The "assoc_type" field sub-categorizes the association object into one of three types, "Event", "Fact", or "Summary". Examples provided below should make the distinction clearer, but it can be simply described as follows:
- "Event": link multiple entities (via "entity1_index", "entity2_index", "geo_index") and represent a transient activity (eg travel)
- "Fact": link multiple entities like "Events" but represent (transient or permanent) relationships (eg being president)
- "Summary": generally link one entity to a free text (eg a quotation: "Obama says...").
Times and Locations
Times and locations are represented by "time_start", "time_end", "geo_index", and "geotag" as described in the JSON code block above.
- Note that events might have neither a "time_start" nor a "time_end" - in general the document "publishedDate" field can be used (this is what happens automatically in the event timeline aggregation).
- Conversely, if an event does not have any geo information, it does not follow that fields such as the document geotag can be used.
Scoring
Scoring relationships between entities is an interesting problem, more so than scoring the entities themselves (see significance discussion). Eg you can score them based on the entities that comprise them, the frequency of the relationship itself, as well as on various graphical criteria. We intend to do some more research on this topic for the future. For now, the "association significance" calculated by Community Edition is simply a Pythagorean combination of the entity significances.
Examples
Document Children
// No time information: { "entity1" : "Rowan Companies Inc.", "entity1_index" : "rowan companies, inc./company", "verb" : "announced", "verb_category" : "acquisition", "entity2" : "LeTourneau Technologies Inc.", "entity2_index" : "letourneau technologies inc/company", "assoc_sig": 23.34546557, "entity1_sig": 13.3454654, "entity2_sig": 20.2134568, "assoc_type" : "Event" }, // Example event generated from structured data sources: { "entity1" : "unknown from unknown", "entity1_index" : "unknown from unknown/personperpetrator", "verb" : "attacked", "verb_category" : "assault/attack", "entity2" : "targeted,business", "entity2_index" : "targeted,business/facilitytype", "time_start" : "2004-05-7", "geotag" : { "latitude" : "35.1666667", "longitude" : "33.3666667" }, "geo_index" : "nicosia,nicosia,cyprus/location", "assoc_sig": 28.34546557, "entity1_sig": 13.3454654, "entity2_sig": 20.2134568, "geo_sig": 9.124365 "assoc_type" : "Event" }
// Time range: { entity1: "amazon" entity1_index: "amazon, inc./company" entity2: "kindle" entity2_index: "kindle/product" assoc_type: "Fact" verb_category: "company product" time_start: "2011-05-27" assoc_sig: 23.34546557, entity1_sig: 13.3454654, entity2_sig: 20.2134568, time_end: "2011-06-21" }, // No time: { entity1: "joe faulhaber" entity1_index: "joe faulhaber/person" verb: "current" verb_category: "career" entity2: "software engineer" entity2_index: "software engineer/position" assoc_sig: 23.34546557, entity1_sig: 13.3454654, entity2_sig: 20.2134568, assoc_type: "Fact" }
{ entity1: "dannel malloy" entity1_index: "dannel malloy/person" verb_category: "quotation" entity2: "this exciting project will be a blueprint for people all around the country who are interested in developing this type of green solar charging technology, linking renewable energy with electric vehicles and making our lives cleaner and greener.; i'm excited to witness the future of this project, and i'm energized about the innovative projects ge is undertaking in our state" assoc_type: "Summary" time_start: "2011-05-26" assoc_sig: 13.3454654 entity1_sig: 13.3454654 }, { "entity1" : "questetra inc.", "entity1_index" : "questetra inc./company", "verb_category" : "company founded", "time_start" : "2008-00-01 12:00:00", "assoc_type" : "Summary" "assoc_sig": 13.3454654 "entity1_sig": 13.3454654 }
Aggregated Associations
{ //... "facts": [ { "assoc_sig": 13.799482037068763, "verb_category": "career", "doccount": 208, "entity1_sig": 14.467955770723677, "entity1_index": "goodluck jonathan/person", "entity2_sig": 13.096933419701786, "entity2_index": "president/position", "assoc_type": "Fact" }, //... ], "events": [ { "assoc_sig": 0.5558116549888003, "verb_category": "joint venture", "doccount": 5, "entity1_sig": 0.12740185072834573, "entity1_index": "eni spa/company", "entity2_sig": 0.7756429335717384, "entity2_index": "nigerian national petroleum co./company", "assoc_type": "Event" }, //... ], //... }
Events Timeline
{ //... "eventsTimeline": [ { "time_start": "2010-10-15", "assoc_sig": 134.8044472232056, "verb_category": "career", "doccount": 1, "entity1": "ibrahim babangida", "entity2": "head of state , general", "entity1_index": "ibrahim babangida/person", "verb": "current", "entity2_index": "head of state , general/position", "assoc_type": "Fact" }, { "time_start": "2010-10-02", "assoc_sig": 126.6919059017276, "verb_category": "career", "doccount": 2, "entity1": "ibrahim babangida", "entity2": "military president , general", "entity1_index": "ibrahim babangida/person", "verb": "past", "time_end": "2010-10-06", "entity2_index": "military president , general/position", "assoc_type": "Fact" }, //... ], //... }