...
What is An Event?
Events are
Basic Event Specification
The following code demonstrates how to specify a basic Where (Place) entity (Note: The sample entity specification and sample entity output below is extracted from a MySql Database Source the full content of which can be viewedhere.):
Code Block | ||
---|---|---|
| ||
{
...
"events" : [
{
"entity1" : "$metadata.offense,$metadata.method",
"verb" : "reported",verb_category:"crime",time_start:"$metadata.reportdatetime",
"geo_index" : "Location",
"geotag" : {latitude:"$metadata.latitude",longitude:"$metadata.longitude"} },
],
...
}
|
In the basic example above the following fields have been specified:
- disambiguous_name
For a given "type", this is (aside from case) a unique identifier for the entity - dimension
One of "Who" (people, organizations), "Where" (places), or "What" (everything else) - type
The entity type, i.e. if dimension is equal to What, type might be equal to Automobile, Airplane, Ship, etc. - geotag
- latitude
String containing a floating point representation of latitude - longitude
String containing a floating point representation of longitude
- latitude
Data is extracted from the source using the $ operator. For example, in the case of the geotag.latitude field the data is extracted from the metadata.latitude field using the following definition:
Code Block |
---|
"latitude" : "$metadata.latitude"
|
The $ operator can also be used to combine multiple source data fields into more complex literal strings as used to specify the document's description field:
Code Block |
---|
"description" : "$metadata.reportdatetime: $metadata.offense,$metadata.method was
reported at: $metadata.blocksiteaddress"
|
Which is converted into the following literal string:
Code Block |
---|
"description" : "Mar 10, 2011 12:00:00 AM: ROBBERY GUN was reported at the 1100 B/O 1ST ST NW"
|
Note: More advanced data transformations can be performed within the Structured Analysis Harvester using JavaScript as documented here: Transforming data with JavaScript.
The result of the entity specification above can be seen in the sample output below:
Code Block | ||
---|---|---|
| ||
{
...
"entities" : [
{
"actual_name" : "1100 B/O 1ST ST NW WASHINGTON DC",
"dimension" : "Where",
"disambiguous_name" : "1100 B/O 1ST ST NW WASHINGTON DC",
"doccount" : 3,
"frequency" : 1,
"gazateer_index" : "1100 b/o 1st st nw washington dc/place",
"geotag" : {
"latitude" : "38.9051666534795",
"longitude" : "-77.0121735726172"
},
"relevance" : "0",
"totalfrequency" : 3,
"type" : "Place"
},
],
...
}
|
In the sample output above please note that the Infinit.e harvest automatically generates the following fields as appropriate:
- doccount
The number of documents in which the entity occurs in the Infinit.e database - frequency
The number of times the entity occurs in the document (Note: the system defaults the frequency count to 1 however it is possible to specify a frequency count within a source document) - totalfrequency
The number of times the entity occurs in all documents in the Infinit.e database - relevance
A value between 0 and 1(in the form of a string containing a floating point number), indicating the entity extraction engine's "opinion" on the entity's relevance within the document