Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
Overview
Overview
h3. Overview of querying

Within the top level query JSON object there is a query field "qt" that is an array of query term objects. Query term objects are described below and allow the following query types:

  • Exact text
  • Free text
  • Entities
  • Geospatial
  • Temporal
  • Events

...

Finally, note that the combination of "qt" and "logic" can be replaced by the "raw" object described at the bottom of this page, which gives the user access to the raw ElasticSearch query API. If both "qt"/"logic" and "raw" are present, the "qt"/"logic" fields are ignored.

Anchor
Exact
Exact
h3. Exact text

The exact text query term object has the following format:

...

For example (using dot notation), qt[0].etext="barack obama" will match on documents containing "barack obama" but not documents containing only (eg) "president obama", "barack 'barry' obama" etc.

Anchor
Free
Free
h3. Free text

The free text query term object has the following format:

...

Note when using dot notation and typing queries directly into the URL bar that characters like '+' must be double-URL-encoded, eg to %25%32%4B (ie via %2B from +).

Anchor
Entities
Entities
h3. Entities

The entity query term object has the following 2 possible formats:

...

  • qt[0].entity="facebook/company": will match on documents containing references to the company Facebook, but not the technology.
  • qt[0].entityValue="facebook"&qt[0].entityType="company": equivalent to the above
  • qt[0].entityValue="facebook": will match on both uses of the term Facebook
  • { "qt": [ { "entity": "barack obama/president", "entityOpt": { "expandAlias": true } } ] }: will match on documents containing references to Barack Obama, but also other common text strings such as "Barry Obama", "President Obama" etc.

Anchor
Geospatial
Geospatial
h3. Geospatial

The geospatial query term has the following possible formats:

...

  • qt[0].geo.centerll="40.12,-71.34"&qt[0].geo.dist="100km": within 100km of the specified lat/long.
  • { "qt": [ { "geo": { "centerll": "40.12,-71.34", "dist": "100" } } ] }: uses the default unit (km), ie is the same query as above.
  • qt[0].geo.minll="(4.1,-171.34)"&qt[0].geo.maxll="40.12,-71.34": bounding box showing lat/long format with and without parantheses.

Anchor
Temporal
Temporal
h3. Temporal 

The temporal query term has the following format:

...

  • { "qt": [ { "time": { "min": "1284666757164", "max": "now" } } ] }: from 16 Sep 2010 until now.
  • qt[0].time.min="now": any time in the future.
  • qt[0].time.max="20100201": any time before 1 Feb 2010.
  • { "qt": [ { "time": { "min": "02/10/2000", "max": "10 Feb 2001 13:00:00" } } ] }: from 10 Feb 2000 until 10 Feb 2001 at 1pm.

Anchor
Events
Events
h3. Events

The event query format is slightly more complex than the others. It is also slightly more limited.

...

  • The "entity1" field is processed as follows:
    • "ftext" and "etext" terms are applied across both the "entity1" and "entity1_index" fields within the entity object (TBD link)
    • entity/entityValue/entityType terms are only applied to the "entity1_index" field
  • The "entity2" field is processed analogously 
  • The "verb" string is applied as an exact text query to the "verb_category" field and a free text query to the "verb" field within the entity object (TBD link)
  • For events with a time range ("time_start" and "time_end" fields), any part of the event time range can match the "time" term.
  • The difference between "Events", "Facts" or "Summaries" is described here (TBD link).
  • If multiple terms are specified then these are ANDed together. There is currently no way of performing more complex boolean equations on individual events (obviously multiple event query terms can be specified and match across all events within a document).
  • Event queries with multiple terms can be a bit slower than other queries (due to its implementation in ElasticSearch).
    Code Block
    languagejavascript
    titleExample event queries
    // Any fact in which Barack Obama is the subject:
    {
    	"event": {
    		"entity1": {
    			"entity": "barack obama/person"
    		},
    		"type":"Fact"
    	}
    }
    // Travel events involving Sarah Palin:
    {
    	"event": {
    		"entity1": {
    			"entityValue":"sarah palin",
    			"entityType":"person"
    		},
    		"verb": "travel",
    	}
    }
    // Events in the future:
    {
    	"event": {
    		"time": {
    			"min": "now"
    		},
    		"type":"Event"
    	}
    }

Anchor
Combining
Combining
h3. Combining query terms

Multiple query terms can be combined in 2 ways:

  • Using the "logic" field as described above under "Overview of querying". This is the standard way of combining separate queries.
  • In addition, within a single query term multiple elements of different types can be merged into a single object - this has the effect of ANDing them together. For example:
    • { "qt": [ { "entity": "barack obama/person", "time": { "min": "1284666757164", "max": "now" } } ] }: documents containing the entity Barack Obama, from 16 Sep 2010 until now.
    • qt[0].etext="apple"&qt[0].ftext="pair": this is equivalent to qt[0].etext="apple"&qt[1].ftext="pair"&logic="1 and 2"

Anchor
Raw
Raw
h3. Raw ElasticSearch queries

At present ElasticSearch is used as the front end of the search engine. 

...