Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
titleEntity format
{
	// EITHER
	"entity": string,
	//OR
	"entityOptentityValue": {string, // (optional,entityValue defaultsis to "entityOpt.expandAlias=false" if not presentmandatory, entityType is optional)
		"expandAliasentityType": booleanstring,
	}
}
//or AND {OPTIONALLY
	"entityValueentityOpt": string
	"entityType": string
	"entityOpt": {{ // (optional, see below for demos)
		"expandAlias": boolean, // (optional, defaults to "entityOpt.expandAlias=false" if not present)
		"expandAliasrawText": boolean // (optional, defaults to false if not present)
	}
}

...

,
	"sentiment": { // (optional, specify one or both of min/max, see below)
		"min": number, 
		"max": number
	}
}

In the first instance the "entity" string is in the format "entityValue/entityType" (this is its "index" form, eg "index" in the Entity JSON object).

In the second, decomposed, instance either of "entityValue" or "entityType" can be left out (in the first case this would match on all entities of a given type; in the second case, it would match on all entity names regardless of the type).

In both cases, the The optional "entityOpt.expandAlias" boolean term will allow matching not just on the entity but also on common, automatically extracted, "aliases". This will tend to have the effect of matching on more documents, some of which will be false positives however. This query type is also slower.

Some examples:

...

Note this is different to manual entity aliasing, described here.

The optional "entityOpt.rawText" boolean term adds the entity's disambiguated name as an exact text query - this can be useful when some sources have low quality entity extraction (eg are in foreign languages, or are in list format etc), since any instance of the name appearing in a page will result in that page's selection.

Some examples:

...

  • qt[0].entity="facebook/company": will match on documents containing references to the company Facebook, but not the technology.
  • qt[0].entityValue="facebook"&qt[0].entityType="company": equivalent to the above
  • qt[0].entityValue="facebook": will match on both uses of the term Facebook
  • { "qt": [ { "entity": "barack obama/president", "entityOpt": { "expandAlias": true } } ] }: will match on documents containing references to Barack Obama, but also other common text strings such as "Barry Obama", "President Obama" etc.

...

  • strings such as "Barry Obama", "President Obama" etc.
Info

Manual entity aliasing is supported and is described here.

Sentiment

Entity queries can be combined with sentiment, using the "sentiment" json described above. If the "entity" or "entityValue" field is specified, then only documents containing that entity with a sentiment field that exists and is in the specified range. If neither of the text fields are specified then only documents containing 1+ entities with sentiment are selected.

Anchor
Geospatial
Geospatial

...

Code Block
languagejavascript
titleEvent format
"assoc": {
	"entity1": { ... }, // the "subject"; can be ftext, etext, or entity/entityValue/entityType query terms
	"entity2": { ... }, // the "object"; can be ftext, etext, or entity/entityValue/entityType query terms

	"verb": string,

	"geo": { ... }, // geo query term
	"time": { ... }, // time query term

	"type": string // "Event", "Fact", or "Summary"
 query term

	"type": string // "Event", "Fact", or "Summary"
},
"sentiment": { // (optional, specify one or both of min/max, see below)
	"min": number, 
 "max": number
}

As can be seen from the above code block, the association query term is a composite of other query term types (free text, exact text and entity terms for "entity1" and "entity2"; also temporal and geospatial).

...

  • The "entity1" field is processed as follows:
    • "ftext" and "etext" terms are applied across both the "entity1" and "entity1_index" fields within the entity object.
    • entity/entityValue/entityType terms are only applied to the "entity1_index" field
  • The "entity2" field is processed analogously 
  • The "verb" string is applied as an exact text query to the "verb_category" field and a free text query to the "verb" field within the entity object
  • For events with a time range ("time_start" and "time_end" fields), any part of the event time range can match the "time" term.
  • The difference between "Events", "Facts" or "Summaries" is described here.
  • If multiple terms are specified then these are ANDed together. There is currently no way of performing more complex boolean equations on individual events (obviously multiple event query terms can be specified and match across all events within a document).
  • If sentiment is specified then only documents containing associations with a sentiment field (this is somewhat rare) that exists and is in the selected range are selected.
  • Event queries with multiple terms can be a bit slower than other queries (due to its implementation in ElasticSearch).

    Code Block
    languagejavascript
    titleExample event queries
    // Any fact in which Barack Obama is the subject:
    {
    	"assoc": {
    		"entity1": {
    			"entity": "barack obama/person"
    		},
    		"type":"Fact"
    	}
    }
    // Travel associations involving Sarah Palin:
    {
    	"assoc": {
    		"entity1": {
    			"entityValue":"sarah palin",
    			"entityType":"person"
    		},
    		"verb": "travel",
    	}
    }
    // Events in the future:
    {
    	"assoc": {
    		"time": {
    			"min": "now"
    		},
    		"type":"Event"
    	}
    }

...