Association JSON format

Association Format

The association object is intended to provide a minimal but generic description of various activities and relationships. It can be thought of as "subject verb object at location over time", where the subjects and objects can be free text and/or point to entities within the document. In this representation:

  • The subject is "entity1" (free form text), or "entity1_index" (the "index" of an entity in the "entities" array of the document or the query).
  • The object is "entity2"/"entity2_index"
  • The verb is the "verb" field, with the option of providing a higher level "verb_category" field, which allows grouping of related fields (eg "walk", "drive" would both have category "travel")

There are a few other minor differences between the association object in as an aggregation vs document child:

  • "entity1", "entity2", and "verb" fields are not present for aggregations (since they can have large numbers of values across all the matching documents).
  • Only "Event" and "Fact" association types appear in aggregations (since "entity1", "entity2" and "verb" fields are not present).
  • "geotag" fields are not present for aggregations (because of the way in which the aggregated event is generated, ie another implementation limitation).
  • In aggregations, the entity/geo significances may be set to 0 for some entities (if they don't occur in the most relevant subset of documents, ie another implementation limitation).

See examples below and the following diagram that helps to clarify the distinction.

Events Timeline

There is a third representation of associations available from Infinit.e queries: the "events timeline". In this case, association are the same as those within the documents, except that scoring is different:

  • The entity significances are not included
  • The association significance ("assoc_sig") is simply the Pythagorean significance of all documents in which the association is included
  • Where times are not included intrinsically in the association, they are filled in to span the time range of the documents including them.

In this section:


 

Event object format
{
     "assoc_type": string, // "Event", "Fact", "Summary" - see below

     // Subject-verb-object
     "entity1": string, // A free form text field containing information about the association "subject"
     "entity1_index": string, // 0-1, if present is the "index" field from one of the entities in the parent document
     "verb": string, // A free form text field describing the association "verb"
     "verb_category": string, // Also a free form text field describing the association "verb", but intended to group related verbs together (eg "travel" for verbs: "flew", "drove")
     "entity2": string, // A free form text field containing information about the association "object"
     "entity2_index": string, // 0-1, if present is the "index" field from one of the entities in the parent document

     // Temporal
     "time_start": string, // 0-1, the start time, in ISO format (yyyy-MM-dd['T'HH:mm:ss])
     "time_end": string, // 0-1, the end time - if there is no start time, the association is considered to be "instantaneous" (at least to within a day)

     // Geo-spatial
     "geo_index": string, // If the association geotag maps into an entity from the parent document then this field is the "index" of that entity
     "geotag": { // 0-1, only if the association has been geotagged
          "lat": number,
          "lon": number,
     },
 
	// Sentiment:
	"sentiment": number, // The sentiment from entity1_index toward entity2 (and optionally entity2_index)
     // Scoring
     "assoc_sig": number, // A significance score for the association object (see below)
     "doccount": number, // The number of documents containing the association, aggregations only
     "entity1_sig": number, // If "entity1_index" is populated, the significance of entity1
     "entity2_sig": number, // If "entity2_index" is populated, the significance of entity2
     "geo_sig": number // If "geo_index" is populated, the significance of that entity
}

Field Guide

assoc_type

The "assoc_type" field sub-categorizes the association object into one of three types, "Event", "Fact", or "Summary". Examples provided below should make the distinction clearer, but it can be simply described as follows:

  • "Event": link multiple entities (via "entity1_index", "entity2_index", "geo_index") and represent a transient activity (eg travel)
  • "Fact": link multiple entities like "Events" but represent (transient or permanent) relationships (eg being president)
  • "Summary": generally link one entity to a free text (eg a quotation: "Obama says...").

Times and Locations

Times and locations are represented by "time_start", "time_end", "geo_index", and "geotag" as described in the JSON code block above.

  • Note that events might have neither a "time_start" nor a "time_end" - in general the document "publishedDate" field can be used (this is what happens automatically in the event timeline aggregation).
  • Conversely, if an event does not have any geo information, it does not follow that fields such as the document geotag can be used.

Scoring

Scoring relationships between entities is an interesting problem, more so than scoring the entities themselves (see significance discussion). Eg you can score them based on the entities that comprise them, the frequency of the relationship itself, as well as on various graphical criteria. We intend to do some more research on this topic for the future. For now, the "association significance" calculated by Community Edition is simply a Pythagorean combination of the entity significances.

Examples

Document Children

 

Example "Events"
// No time information:
{
        "entity1" : "Rowan Companies Inc.",
        "entity1_index" : "rowan companies, inc./company",
        "verb" : "announced",
        "verb_category" : "acquisition",
        "entity2" : "LeTourneau Technologies Inc.",
        "entity2_index" : "letourneau technologies inc/company",
	"assoc_sig": 23.34546557,
	"entity1_sig": 13.3454654,
	"entity2_sig": 20.2134568,
        "assoc_type" : "Event"
},
// Example event generated from structured data sources:
{
       "entity1" : "unknown from unknown",
       "entity1_index" : "unknown from unknown/personperpetrator",
       "verb" : "attacked",
       "verb_category" : "assault/attack",
       "entity2" : "targeted,business",
       "entity2_index" : "targeted,business/facilitytype",
       "time_start" : "2004-05-7",
       "geotag" : {
             "latitude" : "35.1666667",
             "longitude" : "33.3666667"
       },
       "geo_index" : "nicosia,nicosia,cyprus/location",
       "assoc_sig": 28.34546557,
       "entity1_sig": 13.3454654,
       "entity2_sig": 20.2134568,
       "geo_sig": 9.124365
       "assoc_type" : "Event"
}
Example "Facts"
// Time range:
{
    entity1: "amazon"
    entity1_index: "amazon, inc./company"
    entity2: "kindle"
    entity2_index: "kindle/product"
    assoc_type: "Fact"
    verb_category: "company product"
    time_start: "2011-05-27"
    assoc_sig: 23.34546557,
    entity1_sig: 13.3454654,
    entity2_sig: 20.2134568,
    time_end: "2011-06-21"
},
// No time:
{
    entity1: "joe faulhaber"
    entity1_index: "joe faulhaber/person"
    verb: "current"
    verb_category: "career"
    entity2: "software engineer"
    entity2_index: "software engineer/position"
    assoc_sig: 23.34546557,
    entity1_sig: 13.3454654,
    entity2_sig: 20.2134568,
    assoc_type: "Fact"
}
Example "Summaries"
{
    entity1: "dannel malloy"
    entity1_index: "dannel malloy/person"
    verb_category: "quotation"
    entity2: "this exciting project will be a blueprint for people all around the country who are interested in developing this type of green solar charging technology, linking renewable energy with electric vehicles and making our lives cleaner and greener.; i'm excited to witness the future of this project, and i'm energized about the innovative projects ge is undertaking in our state"
    assoc_type: "Summary"
    time_start: "2011-05-26"
    assoc_sig: 13.3454654
    entity1_sig: 13.3454654
},
{
    "entity1" : "questetra inc.",
    "entity1_index" : "questetra inc./company",
    "verb_category" : "company founded",
    "time_start" : "2008-00-01 12:00:00",
    "assoc_type" : "Summary"
    "assoc_sig": 13.3454654
    "entity1_sig": 13.3454654
}

Aggregated Associations

 

Example aggregated associations
{
//...
	"facts": [
		{
			"assoc_sig": 13.799482037068763,
			"verb_category": "career",
			"doccount": 208,
			"entity1_sig": 14.467955770723677,
			"entity1_index": "goodluck jonathan/person",
			"entity2_sig": 13.096933419701786,
			"entity2_index": "president/position",
			"assoc_type": "Fact"
		},
		//...
	],

	"events": [
		{
			"assoc_sig": 0.5558116549888003,
			"verb_category": "joint venture",
			"doccount": 5,
			"entity1_sig": 0.12740185072834573,
			"entity1_index": "eni spa/company",
			"entity2_sig": 0.7756429335717384,
			"entity2_index": "nigerian national petroleum co./company",
			"assoc_type": "Event"
		},
		//...
	],

//...
}

Events Timeline

 

Example events timeline
{
//...
	"eventsTimeline": [
		{
			"time_start": "2010-10-15",
			"assoc_sig": 134.8044472232056,
			"verb_category": "career",
			"doccount": 1,
			"entity1": "ibrahim babangida",
			"entity2": "head of state , general",
			"entity1_index": "ibrahim babangida/person",
			"verb": "current",
			"entity2_index": "head of state , general/position",
			"assoc_type": "Fact"
		},
		{
			"time_start": "2010-10-02",
			"assoc_sig": 126.6919059017276,
			"verb_category": "career",
			"doccount": 2,
			"entity1": "ibrahim babangida",
			"entity2": "military president , general",
			"entity1_index": "ibrahim babangida/person",
			"verb": "past",
			"time_end": "2010-10-06",
			"entity2_index": "military president , general/position",
			"assoc_type": "Fact"
		},
		//...
	],
//...
}