What is An Event?
Events are
Basic Event Specification
The following code demonstrates how to specify a basic event (Note: The sample event specification and sample event output below is extracted from a MySql Database Source the full content of which can be viewed here.):
{ ... "events" : [ { "entity1" : "$metadata.offense,$metadata.method", "verb" : "reported",verb_category:"crime",time_start:"$metadata.reportdatetime", "geo_index" : "Location", "geotag" : { "latitude" : "$metadata.latitude", "longitude" : "$metadata.longitude"} }, ], ... }
In the basic example above the following fields have been specified:
- entity1
A free form text field containing information about the event "subject" - verb
A free form text field describing the event "verb" - geoindex
If the event geotag maps into an entity from the parent document then this field is the "gazateer_index" of that entity - geotag
- latitude
String containing a floating point representation of latitude - longitude
String containing a floating point representation of longitude
- latitude
Data is extracted from the source using the $ operator. For example, in the case of the geotag.latitude field the data is extracted from the metadata.latitude field using the following definition:
"latitude" : "$metadata.latitude"
The $ operator can also be used to combine multiple source data fields into more complex literal strings as used to specify the document's description field:
"description" : "$metadata.reportdatetime: $metadata.offense,$metadata.method was reported at: $metadata.blocksiteaddress"
Which is converted into the following literal string:
"description" : "Mar 10, 2011 12:00:00 AM: ROBBERY GUN was reported at the 1100 B/O 1ST ST NW"
Note: More advanced data transformations can be performed within the Structured Analysis Harvester using JavaScript as documented here: Transforming data with JavaScript.
The result of the entity specification above can be seen in the sample output below:
{ ... "entities" : [ { "actual_name" : "1100 B/O 1ST ST NW WASHINGTON DC", "dimension" : "Where", "disambiguous_name" : "1100 B/O 1ST ST NW WASHINGTON DC", "doccount" : 3, "frequency" : 1, "gazateer_index" : "1100 b/o 1st st nw washington dc/place", "geotag" : { "latitude" : "38.9051666534795", "longitude" : "-77.0121735726172" }, "relevance" : "0", "totalfrequency" : 3, "type" : "Place" }, ], ... }
In the sample output above please note that the Infinit.e harvest automatically generates the following fields as appropriate:
- doccount
The number of documents in which the entity occurs in the Infinit.e database - frequency
The number of times the entity occurs in the document (Note: the system defaults the frequency count to 1 however it is possible to specify a frequency count within a source document) - totalfrequency
The number of times the entity occurs in all documents in the Infinit.e database - relevance
A value between 0 and 1(in the form of a string containing a floating point number), indicating the entity extraction engine's "opinion" on the entity's relevance within the document