Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Overview

This toolkit element enables the generation of one or more types of entity based on the document or content metadata. The expressions default to replacement strings, or $SCRIPT(...) can be used to return a string using javascript.

Format

{
	"display": string,
	"entities": [
	{
	    "iterateOver" : "string", // OPTIONAL: If specified, a metadata field (nesting supported using dot notation) which is looped over to generate calls with _value/_iterator/_index
	    "disambiguated_name" : "string", // MANDATORY: String/script, the disambiguated name of the entity
	    "actual_name" : "string", // OPTIONAL: String/script, the actual name of the entity if different to the disambiguated name
	    "dimension" : "string", // MANDATORY: String/script: Must be/return one of "Who", "What", "Where"
	    "type" : "string", // MANDATORY: String/script: It is recommended to use a type from the 
    			            // OpenCyc, AlchemyAPI, or OpenCalais ontologies, for compatibility with future Infinit.e features
 
	    "linkdata": "string", // OPTIONAL: if present should return a comma-separated list of URLs (commas should be URL-encoded)
	    "relevance" : "string", // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double
	    "sentiment" : "string", // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double, by convention this is between -1.0 and 1.0.
	    "frequency" : "string", // OPTIONAL: Must specify/return a long/string-parsable-into-a-long
	    "geotag" : { // OPTIONAL: Format is identical to the docGeo format specified above
	        "lat": "string", "lon": "string,
    	    "city": "string", "stateProvince": "string, "country": "string", "countryCode": "string
	    },
	    "ontology_type": "string", // OPTIONAL: String/script: Only used if geotag is specified: 
	                    // allows specification of the scale of the geographic entity (see below for useful link), defaults to "point"
	    "useDocGeo": "boolean", // OPTIONAL: If true, uses any lat/long generated from the top level "docGeo" specification, defaults to false
	    "creationCriteriaScript" : "string", // OPTIONAL: script: If populated, runs a user script function and if return value is false doesn't create the object
	}
	]
}

 

Description

Entities are the who, what, and where's contained within a record (i.e. people, places, and things).

The following table describes the parameters of the manual entities configuration.

ParameterDescription
iterateOver

OPTIONAL: If specified, a metadata field (nesting supported using dot notation) which is looped over to generate calls with _value/_iterator/_index

disambiguated_name

MANDATORY: String/script, the disambiguated name of the entity

actual_name

OPTIONAL: String/script, the actual name of the entity if different to the disambiguated name

dimension

MANDATORY: String/script: Must be/return one of "Who", "What", "Where"

type

MANDATORY: String/script: It is recommended to use a type from the // OpenCyc, AlchemyAPI, or OpenCalais ontologies, for compatibility with future Infinit.e features

linkdata

OPTIONAL: if present should return a comma-separated list of URLs (commas should be URL-encoded)

relevance

OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double

sentiment

OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double, by convention this is between -1.0 and 1.0.

frequency

OPTIONAL: Must specify/return a long/string-parsable-into-a-long

geotag

OPTIONAL: Format is identical to the docGeo format specified above "lat": "string", "lon": "string, "city": "string", "stateProvince": "string, "country": "string", "countryCode": "string

ontology_type

OPTIONAL: String/script: Only used if geotag is specified: // allows specification of the scale of the geographic entity (see below for useful link), defaults to "point"

useDocGeo

OPTIONAL: If true, uses any lat/long generated from the top level "docGeo" specification, defaults to false

creationCriteriaScript

OPTIONAL: script: If populated, runs a user script function and if return value is false doesn't create the object

 

Examples

Entities

In the example source, the entities block has been configured to set actual_name as a value which will be provided by the document metadata.  Entities will expect a JSON object which is a representation of the Twitter user's display name.

Data is extracted from the source using the $ operator.

    },        {
            "entities": [
                {
                    "actual_name": "$metadata.json.actor.displayName",
                    "dimension": "Who",
                    "disambiguated_name": "$metadata.json.actor.preferredUsername",
                    "linkdata": "$metadata.json.actor.link",
                    "type": "TwitterHandle"
                },

Sample Output

In the output we see that an entity has been created based on the Twitter user's Twitter handle.

 {            "actual_name": "CRM Buddy",
            "dimension": "Who",
            "disambiguated_name": "FocalCRM",
            "doccount": 0,
            "frequency": 1,
            "index": "focalcrm/twitterhandle",
            "linkdata": ["http://www.twitter.com/FocalCRM"],
            "relevance": 0,
            "totalfrequency": -1,
            "type": "TwitterHandle"
        },

 

Metadata

The metadata reveals how the value for actual_name was derived from the displayName from the original document.

 ],    "mediaType": ["Social"],
    "metadata": {"json": [{
        "actor": {
            "displayName": "CRM Buddy",
            "followersCount": "245",
            "friendsCount": "0",
            "id": "id:twitter.com:835627776",
            "image": "http://a0.twimg.com/profile_images/2630355549/8cad59efaddd57283dbb159332336744_normal.jpeg",
            "languages": ["en"],
            "link": "http://www.twitter.com/FocalCRM",
            "links": [{"rel": "me"}],
            "listedCount": "6",
            "objectType": "person",
            "postedTime": "2012-09-20T13:59:56.000Z",
            "preferredUsername": "FocalCRM",
            "statusesCount": "3688",
            "summary": "",
            "verified": "false"
        },

 

Extracting Entities From Arrays

You can use iterateOver to create an entity for each object in an array.

Source

In the example below, the source for Entities has been configured to iterateOver the document metadata "victim."

In the example, frequency uses $FUNC to call the function getVictimCount, which was previously imported into the script engine.  For more information about calling imported javascript functions, see Using Javascript.

 

  },
                {
                    "dimension": "Who",
                    "disambiguated_name": "$FUNC( getVictim(); )",
                    "frequency": "$FUNC( getVictimCount(); )",
                    "type": "VictimType",
                    "useDocGeo": false,
                    "iterateOver": "victim"
                },
                {
                    "dimension": "Who",
                    "disambiguated_name": "$FUNC( getVictim(); )",
                    "frequency": "$hostagecount",
                    "type": "HostageType",
                    "useDocGeo": false,
                    "iterateOver": "victim"
                }
            ]
        },

 

Sample Output

The sample output displays the entities created based on the array that was returned for the metadata field "victim."

  },
        {
            "actual_name": "Targeted, Civilian, Adult from Afghanistan",
            "dimension": "Who",
            "disambiguated_name": "Targeted, Civilian, Adult from Afghanistan",
            "doccount": 0,
            "frequency": 5,
            "index": "targeted, civilian, adult from afghanistan/victimtype",
            "relevance": 0,
            "totalfrequency": -1,
            "type": "VictimType"
        },
        {
            "actual_name": "Targeted, Civilian, Child from Afghanistan",
            "dimension": "Who",
            "disambiguated_name": "Targeted, Civilian, Child from Afghanistan",
            "doccount": 0,
            "frequency": 2,
            "index": "targeted, civilian, child from afghanistan/victimtype",
            "relevance": 0,
            "totalfrequency": -1,
            "type": "VictimType"
        },

 

Metadata

The original document metadata used to create the "victim" entities.

"victim": [
            {
                "child": "No",
                "combatant": "No",
                "deadcount": "1",
                "definingcharacteristic": "Unknown",
                "hostagecount": "0",
                "indicator": "Targeted",
                "nationality": "Afghanistan",
                "targetedcharacteristic": "Unknown",
                "victimtype": "Civilian",
                "woundedcount": "4"
            },
            {
                "child": "Yes",
                "combatant": "No",
                "deadcount": "2",
                "definingcharacteristic": "Unknown",
                "hostagecount": "0",
                "indicator": "Targeted",
                "nationality": "Afghanistan",
                "targetedcharacteristic": "Unknown",
                "victimtype": "Civilian",
                "woundedcount": "0"
            }
        ],

 

Specifying Entity Location

In the example source, the entity block is configured to output an entity with dimension "where."  $SCRIPT is used to call functions already declared in globals.

In the example, inline javascript is used by enclosing the javascript in the $SCRIPT() block.  For more information about inline javascript, see Using Javascript.

 },                {
                    "dimension": "Where",
                    "disambiguated_name": "$metadata.json.actor.location.displayName",
                    "geotag": {
                        "city": "$SCRIPT( return getAddressVal( _doc.metadata.json[0].actor.location.displayName, 0 ) )",
                        "stateProvince": "$SCRIPT( return getRegion(getAddressVal( _doc.metadata.json[0].actor.location.displayName, 1 )) )",
                        "countryCode": "US",
                        "alternatives": [
                            {
                                "stateProvince": "$SCRIPT( return getRegion(getAddressVal( _doc.metadata.json[0].actor.location.displayName, 1 )) )",
                                "countryCode": "US"
                            }
                        ]
                    },

 

The example source will return output with the dimension of "Where" if the location can be determined by the metadata.

In some cases, it will not be clear what geographical type a field is (eg a freeform field that might be city, state, or country). The geographical specification allows you to specify alternatives.

The alternatives are tried in order until one of them works or there are no more to try.

Globals:

 {
            "globals": {
                "scripts": [
                    "function getAddressVal( addressStr, number) { try { var addressArray = addressStr.split(/ *, */); if (addressArray != null && addressArray.length > 0) { if (addressArray[number].toLowerCase()=='ny') { return 'new york'; } else if (addressArray[number].toLowerCase()=='long island' || addressArray[number].toLowerCase()=='li') { return 'medford'; } else { return addressArray[number]; } } else { return ''; } } catch (err) { return ''; } } function getRegion( code ) { if (code.toLowerCase()=='ny') {return 'New York';} else if (code.toLowerCase()=='nj') {return 'New Jersey';} else if (code.toLowerCase()=='ct') {return 'Connecticut';} else if (code.toLowerCase()=='md') {return 'Maryland';} else if (code.toLowerCase()=='va') {return 'Virginia';} else if (code.toLowerCase()=='pa') {return 'Pennsylvania';} else if (code.toLowerCase()=='nj') {return 'New Jersey';} else {return 'New York';} }"
                ]
            }

Footnotes:

Legacy documentation:

Legacy documentation:

 

 

  • No labels