Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview of querying

Within the top level query JSON object there is a query field "qt" that is an array of query term objects. Query term objects are described below and allow the following query types:

  • Exact text
  • Free text
  • Entities
  • Geospatial
  • Temporal
  • Events

These terms can then be combined in an arbitrary boolean expression (with the operators AND, OR, NOT and parantheses) using the (case insensitive) "logic" field of the top level object, where the different terms are denoted by their index in the array (counting from 1). For example:

Example top level query
{
	"qt": [ { term1 }, { term2 }, { term3 }, { term4 } ],
	"logic": "1 AND (2 OR 3) AND NOT 4"
}

In the above example each of term1-term4 is one of the objects described below.

If the logic term is set to null or not present, it defaults to ANDing all the terms together.

In the "dot notation" used to represent query objects as URL parameters in "GET" requests, the different "qt" terms are represented as "qt[0]", "qt[1]", etc (ie indexed from 0, unlike the "logic" string).

Finally, note that the combination of "qt" and "logic" can be replaced by the "raw" object described at the bottom of this page, which gives the user access to the raw ElasticSearch query API. If both "qt"/"logic" and "raw" are present, the "qt"/"logic" fields are ignored.

Exact text

The exact text query term object has the following format:

Free text format
{
	"etext": string
}

The "etext" string is a phrase that must match exactly somewhere in the document (in any of the text fields). There is one exception: if "*" is the "etext" field then it matches all documents.

For example (using dot notation), qt[0].etext="barack obama" will match on documents containing "barack obama" but not documents containing only (eg) "president obama", "barack 'barry' obama" etc.

Free text

The free text query term object has the following format:

Free text format
{
	"ftext": string
}

The "ftext" field represents an arbitrary Lucene query (Lucene syntax). By default, all text fields in the document (including its entities and events; link to the document format TBD) are included in the query, though the standard "field:text" syntax can be used.

For example, { "qt": [ "ftext": "barack obama" ] } will match on any documents containing either "barack" or "obama", with documents containing both scored more highly (TBD link to scoring params). { "qt": [ "ftext": "+barack +obama" ] } requires both be present (but not necessarily in the same phrase), and { "qt": [ "ftext": "'barack obama'" ] } is equivalent to the "etext" query described above.

Note when using dot notation and typing queries directly into the URL bar that characters like '+' must be double-URL-encoded, eg to %25%32%4B (ie via %2B from +).

Entities

The entity query term object has the following 2 possible formats:

Entity format
{
	"entity": string
	"entityOpt": { // (optional, defaults to "entityOpt.expandAlias=false" if not present)
		"expandAlias": boolean
	}
}
//or
{
	"entityValue": string
	"entityType": string
	"entityOpt": { // (optional, defaults to "entityOpt.expandAlias=false" if not present)
		"expandAlias": boolean
	}
}

In the first instance the "entity" string is in the format "entityValue/entityType" (this is its "index" form, eg "gazateer_index" in the Entity JSON object (TBD link)).

In the second, decomposed, instance either of "entityValue" or "entityType" can be left out (in the first case this would match on all entities of a given type; in the second case, it would match on all entity names regardless of the type).

In both cases, the optional "entityOpt.expandAlias" boolean term will allow matching not just on the entity but also on common aliases. This will tend to have the effect of matching on more documents, some of which will be false positives however. This query type is also slower.

Some examples:

  • qt[0].entity="facebook/company": will match on documents containing references to the company Facebook, but not the technology.
  • qt[0].entityValue="facebook"&qt[0].entityType="company": equivalent to the above
  • qt[0].entityValue="facebook": will match on both uses of the term Facebook
  • { "qt": [ { "entity": "barack obama/president", "entityOpt": { "expandAlias": true } } ] }: will match on documents containing references to Barack Obama, but also other common text strings such as "Barry Obama", "President Obama" etc.

Geospatial

The geospatial query term has the following possible formats:

Geospatial format
{
	"geo": {
		"centerll": string,
		"dist": string
	}
}
//or
{
	"geo": {
		"minll": string,
		"maxll": string
	}
}

In the first case, the user is specifying the center latitude ("centerll") and longitude pair and radius ("dist") of a circle.

In the second case, the user is specifying a bounding box via the "minll" (lowest lat and long values ie the "bottom left") and "maxll" (highest lat and long values, ie the "top right").

In all cases the lat/long values are represented as strings either as "(<lat>,<long>)" or "<lat>,<long>" (ie the same but without parantheses).

The "dist" string is a distance in the format "<distance><unit>" where <distance> is an integer or floating point number, and unit is one of "m" (miles), "km" (kilometers), "nm" (nautical miles).

Examples:

  •  
  • No labels