Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The "ftext" field represents an arbitrary Lucene query (Lucene syntax). By default, all text fields in the document (including its entities and events; link to the document format) are included in the query, though the standard "field:text" syntax can be used.

For example,  { "qt":  [ "ftext": "barack obama" ]  } will match on any documents containing either "barack" or "obama", with documents containing both scored more highly.  { "qt":  [ "ftext": "+barack +obama" ]  } requires both be present (but not necessarily in the same phrase), and and { "qt":  [ "ftext": "'barack obama'" ]  } is equivalent to the "etext" query described above.

...

In the first instance the "entity" string is in the format "entityValue/entityType" (this is its "index" form, eg "gazateer_index" in the Entity JSON object).

In the second, decomposed, instance either of "entityValue" or "entityType" can be left out (in the first case this would match on all entities of a given type; in the second case, it would match on all entity names regardless of the type).

...

Code Block
languagejavascript
titleGeospatial format
{
	"geo": {
		"centerll": string,
		"dist": string,
		"ontology_type": string // optional, see below
	}
}
//or
{
	"geo": {
		"minll": string,
		"maxll": string
		"ontology_type": string // optional, see below
	}
}

In the first case, the user is specifying the center latitude ("centerll") and longitude pair and radius ("dist") of a circle.

...

The "dist" string is a distance in the format "<distance><unit>" where <distance> is an integer or floating point number, and unit is one of "m" (miles), "km" (kilometers), "nm" (nautical miles)), "km" (kilometers), "nm" (nautical miles).

In both cases, an optional "ontology_type" can be specified. If it is specified, then entities with an higher "ontology_type" and ignored: see geo discussion for more details.

Examples:

  • qt[0].geo.centerll="40.12,-71.34"&qt[0].geo.dist="100km": within 100km of the specified lat/long.
  • { "qt": [ { "geo": { "centerll": "40.12,-71.34", "dist": "100" } } ] }: uses the default unit (km), ie is the same query as above.
  • qt[0].geo.minll="(4.1,-171.34)"&qt[0].geo.maxll="40.12,-71.34": bounding box showing lat/long format with and without parantheses.

...

  • "now" which always resolves to the current time, 
  • any Unix time (ie milliseconds after "Jan 1 00:00:00 1970"), 
  • and the following date/date-time formats: "yyyy'-'DDD", "yyyy'-'M'-'dd", "yyyyMMdd", "dd MMM yyyy", "dd MMM yy", "MM/dd/yy", "MM/dd/yyyy", "MM.dd.yy", "MM.dd.yyyy", "dd MMM yyyy hh:mm:ss", "yyyy-MM-dd" (ISO Date), "yyyy-MM-ddZZ" (ISO Date-Timezone", "yyyy-MM-dd'T'HH:mm:ssZZ" (ISO DateTime-Timezone), "EEE, dd MMM yyyy HH:mm:ss Z" (SMTP DateTime).

Examples:

  • { "qt": [ { "time": { "min": "1284666757164", "max": "now" } } ] }: from 16 Sep 2010 until now.
  • qt[0].time.min="now": any time in the future.
  • qt[0].time.max="20100201": any time before 1 Feb 2010.
  • { "qt": [ { "time": { "min": "02/10/2000", "max": "10 Feb 2001 13:00:00" } } ] }: from 10 Feb 2000 until 10 Feb 2001 at 1pm.

Anchor
Events
Events

...

Associations

The event association query format is slightly more complex than the others. It is also slightly more limited.

The event association format is as follows:

Code Block
languagejavascript
titleEvent format
{
	"entity1": { ... }, // the "subject"; can be ftext, etext, or entity/entityValue/entityType query terms
	"entity2": { ... }, // the "object"; can be ftext, etext, or entity/entityValue/entityType query terms

	"verb": string,

	"geo": { ... }, // geo query term
	"time": { ... }, // time query term

	"type": string // "Event", "Fact", or "Summary"
}

As can be seen from the above code block, the event association query term is a composite of other query term types (free text, exact text and entity terms for "entity1" and "entity2"; also temporal and geospatial).

...

  • The "entity1" field is processed as follows:
    • "ftext" and "etext" terms are applied across both the "entity1" and "entity1_index" fields within the entity object.
    • entity/entityValue/entityType terms are only applied to the "entity1_index" field
  • The "entity2" field is processed analogously 
  • The "verb" string is applied as an exact text query to the "verb_category" field and a free text query to the "verb" field within the entity object
  • For events with a time range ("time_start" and "time_end" fields), any part of the event time range can match the "time" term.
  • The difference between "Events", "Facts" or "Summaries" is described here.
  • If multiple terms are specified then these are ANDed together. There is currently no way of performing more complex boolean equations on individual events (obviously multiple event query terms can be specified and match across all events within a document).
  • Event queries with multiple terms can be a bit slower than other queries (due to its implementation in ElasticSearch).

    Code Block
    languagejavascript
    titleExample event queries
    // Any fact in which Barack Obama is the subject:
    {
    	"eventassociation": {
    		"entity1": {
    			"entity": "barack obama/person"
    		},
    		"type":"Fact"
    	}
    }
    // Travel eventsassociations involving Sarah Palin:
    {
    	"eventassociation": {
    		"entity1": {
    			"entityValue":"sarah palin",
    			"entityType":"person"
    		},
    		"verb": "travel",
    	}
    }
    // Events in the future:
    {
    	"eventassociation": {
    		"time": {
    			"min": "now"
    		},
    		"type":"Event"
    	}
    }

...

  • Using the "logic" field as described above under "Overview of querying". This is the standard way of combining separate queries.
  • In addition, within a single query term multiple elements of different types can be merged into a single object - this has the effect of ANDing them together. For example:
    • { "qt": [ { "entity": "barack obama/person", "time": { "min": "1284666757164", "max": "now" } } ] }: documents containing the entity Barack Obama, from 16 Sep 2010 until now.
    • qt[0].etext="apple"&qt[0].ftext="pair": this is equivalent to qt[0].etext="apple"&qt[1].ftext="pair"&logic="1 and 2"

...