...
Code Block | ||
---|---|---|
| ||
{ ... "events" : [ { "entity1" : "$metadata.offense,$metadata.method", "verb" : "reported", "verb_category" : "crime", "time_start" : "$metadata.reportdatetime", "geo_index" : "Location", "geotag" : { "latitude" : "$metadata.latitude", "longitude" : "$metadata.longitude"} }, ], ... } |
...
- entity1
A free form text field containing information about the event "subject" - - entity1_index
- - verb
A free form text field describing the event "verb" - verb_category
Also a free form text field describing the event "verb", but intended to group related verbs together (eg "travel" for verbs: "flew", "drove") - geoindex
If the event geotag maps into an entity from the parent document then this field is the "gazateer_index" of that entity - geotag
- latitude
String containing a floating point representation of latitude - longitude
String containing a floating point representation of longitude
- latitude
Data is extracted from the source using the $ operator. For example, in the case of the geotag.latitude field the data is extracted from the metadata.latitude field using the following definition:
Code Block |
---|
"latitude" : "$metadata.latitude"
|
The $ operator can also be used to combine multiple source data fields into more complex literal strings as used to specify the document's description field:
Code Block |
---|
"description" : "$metadata.reportdatetime: $metadata.offense,$metadata.method was
reported at: $metadata.blocksiteaddress"
|
Which is converted into the following literal string:
Code Block |
---|
"description" : "Mar 10, 2011 12:00:00 AM: ROBBERY GUN was reported at the 1100 B/O 1ST ST NW"
|
Note: More advanced data transformations can be performed within the Structured Analysis Harvester using JavaScript as documented here: Transforming data with JavaScript.
The result of the entity event specification above can be seen in the sample output below:
Code Block | ||
---|---|---|
| ||
{ ... "entitiesevents" : [ { "actual_nameentity1" : "1100 B/O 1ST ST NW WASHINGTON DC", robbery gun", "dimension" : "Where", "disambiguous_name"entity1_index" : "1100robbery B/O 1ST ST NW WASHINGTON DCgun/criminalactivity", "doccountverb" : 3, "frequency" : 1,"reported", "gazateerverb_indexcategory" : "1100 b/o 1st st nw washington dc/placecrime", "geotag" : { "latitude" : "38.9051666534795", "longitude" : "-77.0121735726172" }, "relevancegeo_index" : "0", "totalfrequency" : 3,1100 b/o 1st st nw washington dc/place", "event_type" : "PlaceEvent" }, ], ... } |
In the sample output above please note that the Infinit.e harvest harvester automatically generates the following fields as appropriate:
...
- event_type
- "Event", "Fact", "Summary"
The "event_type" field sub-categorizes the "event" object into one of three types, "Event", "Fact", or "Summary". Examples provided below should make the distinction clearer, but it can be simply described as follows:
- "Event": link multiple entities (via "entity1_index", "entity2_index", "geo_index") and represent a transient activity (eg travel)
- "Fact": link multiple entities like "Events" but represent (transient or permanent) relationships (eg being president)
- "Summary": generally link 1 entity to a free text (eg a quotation: "Obama says...").