Manual entities
Format
{ "display": string, "entities": [ { "iterateOver":string, // OPTIONAL: If specified, a metadata field (nesting supported using dot notation) which is looped over to generate calls with _value/_iterator/_index "disambiguated_name":string, // MANDATORY: String/script, the disambiguated name of the entity "actual_name":string, // OPTIONAL: String/script, the actual name of the entity if different to the disambiguated name "dimension":string, // MANDATORY: String/script: Must be/return one of "Who", "What", "Where" "type":string, // MANDATORY: String/script: It is recommended to use a type from the // OpenCyc, AlchemyAPI, or OpenCalais ontologies, for compatibility with future Infinit.e features "linkdata":string, // OPTIONAL: if present should return a comma-separated list of URLs (commas should be URL-encoded) "relevance":string, // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double "sentiment":string, // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double, by convention this is between -1.0 and 1.0. "frequency":string, // OPTIONAL: Must specify/return a long/string-parsable-into-a-long "geotag": { // OPTIONAL: Format is identical to the docGeo format specified above "lat":string, "lon":string, "city":string, "stateProvince":string, "country":string, "countryCode":string }, "ontology_type":string, // OPTIONAL: String/script: Only used if geotag is specified: // allows specification of the scale of the geographic entity (see below for useful link), defaults to "point" "useDocGeo":boolean, // OPTIONAL: If true, uses any lat/long generated from the top level "docGeo" specification, defaults to false "creationCriteriaScript":string, // OPTIONAL: script: If populated, runs a user script function and if return value is false doesn't create the object } ] }
Description
Entities are the who, what, and where's contained within a record (i.e. people, places, and things).
The following table describes the parameters of the manual entities configuration.
Parameter | Description |
---|---|
iterateOver | OPTIONAL: If specified, a metadata field (nesting supported using dot notation) which is looped over to generate calls with _value/_iterator/_index |
disambiguated_name | MANDATORY: String/script, the disambiguated name of the entity |
actual_name | OPTIONAL: String/script, the actual name of the entity if different to the disambiguated name |
dimension | MANDATORY: String/script: Must be/return one of "Who", "What", "Where" |
type | MANDATORY: String/script: It is recommended to use a type from the // OpenCyc, AlchemyAPI, or OpenCalais ontologies, for compatibility with future Infinit.e features |
linkdata | OPTIONAL: if present should return a comma-separated list of URLs (commas should be URL-encoded) |
relevance | OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double |
sentiment | OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double, by convention this is between -1.0 and 1.0. |
frequency | OPTIONAL: Must specify/return a long/string-parsable-into-a-long |
geotag | OPTIONAL: Format is identical to the docGeo format specified above "lat": "string", "lon": "string, "city": "string", "stateProvince": "string, "country": "string", "countryCode": "string |
ontology_type | OPTIONAL: String/script: Only used if geotag is specified: // allows specification of the scale of the geographic entity (see below for useful link), defaults to "point" |
useDocGeo | OPTIONAL: If true, uses any lat/long generated from the top level "docGeo" specification, defaults to false |
creationCriteriaScript | OPTIONAL: script: If populated, runs a user script function and if return value is false doesn't create the object |
Basic Use
In the basic use case, you can specify metadata objects for the fields in the entities block. This will enable you to specify the entities as required.
See detailed example below.
iterateOver
For Manual Entities you can use iterateOver
, in the more advanced case where you want to extract entities from metadata arrays.
For example, iterateOver
will loop over the metadata."name"[] objects and create an entity for each object in the array.
where "name" represents the name of a metadata object
Functional Specification
For non-nested entity specification objects, the first field in the iterateOver
field refers to the metadata object, eg "iterateOver": "location" refers to "_doc.metadata.location".
For nested objects, the first field refers to the "parent" object (but you shouldn't be using nesting now that dot notation is available!).
Nesting:
Nesting is supported using the "dot notation" eg if in the above instance, the victim was inside an object (or array of objects) called "more_information", then the "iterateOver" field would be set to "more_information.victim".
This would be equivalent to the less tidy technique of nesting the Entity Specification JSON object, the first having "iterateOver": "more_information", and containing a second Entity Specification JSON object identical to the original example ( "iterateOver": "location").
About Arrays and Objects:
Arrays and objects are treated equally in the dot-notation (ie an object is just treated like an array of size 1)
eg. for both "{ A: { B: { C: "value" } } }" and "{ A: [ B: [ { C: [ value ] } ] ] }", you would use iterateOver
: "A.B.C" to get to "value"
If you are iterating over an object, then use "_iterator.FIELD" in scripts, "$FIELD" for normal strings.
Note that "$metadata.X" won't work inside "iterateOver" clauses, you have to use constructs like "$SCRIPT( return _doc.metadata.X[0]; )" to get at the top-level fields.
If you are iterating over a value then use "_value" in scripts, "$" for normal strings.
Primitives:
If any of the fields point to primitives (eg B: [ "val1", "val2" ] in the example above) then an error is thrown unless the "creation criteria" script for the nested object is specified.
You can still throw errors from the script by checking if "(_iterator==null)" if you want to) This enables writing objects that will handle fields being either primitives or objects (eg by checking vs _iterator and _value).
creationcriteriaScripts
You can use "creationCriteriaScript" scriptlets to filter out unwanted entities. eg when looping over metadata arrays.
Creationcriteriascripts are also useful when using primitives with nested objects, as described here.
Examples
Basic
In the example source, the entities block has been configured to set actual_name
as a value which will be provided by the document metadata. Entities will expect a JSON object which is a representation of the Twitter user's display name.
Data is extracted from the source using the $ operator.
}, { "entities": [ { "actual_name": "$metadata.json.actor.displayName", "dimension": "Who", "disambiguated_name": "$metadata.json.actor.preferredUsername", "linkdata": "$metadata.json.actor.link", "type": "TwitterHandle" },
Sample Output
In the output we see that an entity has been created based on the Twitter user's Twitter handle.
{ "actual_name": "CRM Buddy", "dimension": "Who", "disambiguated_name": "FocalCRM", "doccount": 0, "frequency": 1, "index": "focalcrm/twitterhandle", "linkdata": ["http://www.twitter.com/FocalCRM"], "relevance": 0, "totalfrequency": -1, "type": "TwitterHandle" },
Metadata
The metadata reveals how the value for actual_name
was derived from the displayName
from the original document.
], "mediaType": ["Social"], "metadata": {"json": [{ "actor": { "displayName": "CRM Buddy", "followersCount": "245", "friendsCount": "0", "id": "id:twitter.com:835627776", "image": "http://a0.twimg.com/profile_images/2630355549/8cad59efaddd57283dbb159332336744_normal.jpeg", "languages": ["en"], "link": "http://www.twitter.com/FocalCRM", "links": [{"rel": "me"}], "listedCount": "6", "objectType": "person", "postedTime": "2012-09-20T13:59:56.000Z", "preferredUsername": "FocalCRM", "statusesCount": "3688", "summary": "", "verified": "false" },
Extracting Entities From Arrays with iterateOver
You can use iterateOver
to create an entity for each object in an array.
Basic Example
In the example below, the source for Entities has been configured to iterateOver
the document metadata "victim."
In the example above the iterateOver value is set to "victim" meaning that Manual Entities will iterate (or loop) over the metadata.victim[] objects and create an entity for each object in the array.
In the example, frequency
uses $FUNC to call the function getVictimCount
, which was previously imported into the script engine. For more information about calling imported javascript functions, see Using Javascript.
}, { "dimension": "Who", "disambiguated_name": "$FUNC( getVictim(); )", "frequency": "$FUNC( getVictimCount(); )", "type": "VictimType", "useDocGeo": false, "iterateOver": "victim" }, { "dimension": "Who", "disambiguated_name": "$FUNC( getVictim(); )", "frequency": "$hostagecount", "type": "HostageType", "useDocGeo": false, "iterateOver": "victim" } ] },
Sample Output:
The sample output displays the entities created based on the array that was returned for the metadata field "victim."
}, { "actual_name": "Targeted, Civilian, Adult from Afghanistan", "dimension": "Who", "disambiguated_name": "Targeted, Civilian, Adult from Afghanistan", "doccount": 0, "frequency": 5, "index": "targeted, civilian, adult from afghanistan/victimtype", "relevance": 0, "totalfrequency": -1, "type": "VictimType" }, { "actual_name": "Targeted, Civilian, Child from Afghanistan", "dimension": "Who", "disambiguated_name": "Targeted, Civilian, Child from Afghanistan", "doccount": 0, "frequency": 2, "index": "targeted, civilian, child from afghanistan/victimtype", "relevance": 0, "totalfrequency": -1, "type": "VictimType" },
Metadata:
The original document metadata used to create the "victim" entities.
"victim": [ { "child": "No", "combatant": "No", "deadcount": "1", "definingcharacteristic": "Unknown", "hostagecount": "0", "indicator": "Targeted", "nationality": "Afghanistan", "targetedcharacteristic": "Unknown", "victimtype": "Civilian", "woundedcount": "4" }, { "child": "Yes", "combatant": "No", "deadcount": "2", "definingcharacteristic": "Unknown", "hostagecount": "0", "indicator": "Targeted", "nationality": "Afghanistan", "targetedcharacteristic": "Unknown", "victimtype": "Civilian", "woundedcount": "0" } ],
Specifying Entity Geographic Location
It is also possible to set values for entity parameters using Javascript functions declared in Globals.
In the example source, the entity block is configured to output an entity with dimension "where." $SCRIPT is used to call functions already declared in Globals.
In the example, inline Javascript is used by enclosing the javascript in the $SCRIPT() block. For more information about inline javascript, see Using Javascript.
}, { "dimension": "Where", "disambiguated_name": "$metadata.json.actor.location.displayName", "geotag": { "city": "$SCRIPT( return getAddressVal( _doc.metadata.json[0].actor.location.displayName, 0 ) )", "stateProvince": "$SCRIPT( return getRegion(getAddressVal( _doc.metadata.json[0].actor.location.displayName, 1 )) )", "countryCode": "US", "alternatives": [ { "stateProvince": "$SCRIPT( return getRegion(getAddressVal( _doc.metadata.json[0].actor.location.displayName, 1 )) )", "countryCode": "US" } ] },
The example source will return output with the dimension of "Where" if the location can be determined by the metadata.
In some cases, it will not be clear what geographical type a field is (eg. a freeform field that might be city, state, or country). The geographical specification allows you to specify alternatives.
The alternatives are tried in order until one of them works or there are no more to try.
Globals:
{ "globals": { "scripts": [ "function getAddressVal( addressStr, number) { try { var addressArray = addressStr.split(/ *, */); if (addressArray != null && addressArray.length > 0) { if (addressArray[number].toLowerCase()=='ny') { return 'new york'; } else if (addressArray[number].toLowerCase()=='long island' || addressArray[number].toLowerCase()=='li') { return 'medford'; } else { return addressArray[number]; } } else { return ''; } } catch (err) { return ''; } } function getRegion( code ) { if (code.toLowerCase()=='ny') {return 'New York';} else if (code.toLowerCase()=='nj') {return 'New Jersey';} else if (code.toLowerCase()=='ct') {return 'Connecticut';} else if (code.toLowerCase()=='md') {return 'Maryland';} else if (code.toLowerCase()=='va') {return 'Virginia';} else if (code.toLowerCase()=='pa') {return 'Pennsylvania';} else if (code.toLowerCase()=='nj') {return 'New Jersey';} else {return 'New York';} }" ] }