StructuredAnalysis object

JSON format

Note that there is a separate overview of using the Structured Analysis Harvester. This page is reference information.

The StructuredAnalysis object of the Source document (the string fields are generally regexes of javascript, see link above):

Source.structuredAnalysis object
{
	"scriptEngine" : "string", // OPTIONAL: String, Infinit.e currently only supports "javascript" (or "JavaScript"), which is the default 
	"script" : "string", // OPTIONAL: String, can contain one or more JavaScript functions,
				// i.e. "function func() { var foo = 'test'; return foo; }"
	"scriptFiles" : [ "string" ], // OPTIONAL: Array of Strings, URLs of JavaScript
					// files to import at runtime
    "caches": { "string": "string", ... } // A list of caches in the format <CACHE_NAME>:<ID> where <ID> is the "_id" of a JSON share, see overview
 
	"title" : "string", // OPTIONAL: String, else document title is whatever is generated by the harvester (eg from RSS/filename)
	"fullText" : "string", // OPTIONAL: String, else full text is taken from the document contents as per usual. 
	"description" : "string", // OPTIONAL: String, else document description is whatever is generated by the harvester (or an entity extractor if supported)
	"displayUrl": "string", // OPTIONAL: String, this field is just used for display
	"publishedDate" : "string", // OPTIONAL: String, must return a date string in a standard format (eg Java, Javascript, ISO, SMTP, MM/dd/yy, MM/dd/yyyy etc)
					// If not present, published data either comes from harvester (eg created date for files), or is the current time
 
	"entities" : [ { ... } ], // OPTIONAL: to create entities from the metadata (see below)
	"associations" : [ { ... } ], // OPTIONAL: to create associations (events/facts/summaries) from the metadata (see below)
	"docGeo" : { // (OPTIONAL, to specify the document geo tag)
		// ONE OF THE FOLLOWING 2 SETS OF FIELDS:
		// Specify directly:
		"lat" : "string", // latitude
		"lon" : "string", // longitude
 
		// Or fill in as many search options as possible, if a match can be found it populates the lat/long
		"city" : "string", // String
		"stateProvince" : "string", // String
		"country" : "string", // String
		"countryCode" : "string" // String
	},
 
	"rejectDocCriteria": "string", // OPTIONAL: String, an optional script that returns null to keep the document, any string to reject the doc (the string is logged)
	"metadataFields": "string", // OPTIONAL: String, if present a comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'),
					// or only include (starts with '+', default) - the fields are deleted after all processing but before indexing and storage.
					// In addition for "-" only, nested objects can be deleted using dot notation, eg "json.object.nested.field"
	"onUpdateScript": "string" // OPTIONAL: Used to preserve existing metadata when documents are updated, and also to generate new metadata based on the differences 
					// between old and new documents. This script is discussed further in the Structured Analysis Overview linked at the top of this page
}
Entities

The entity format this specification object generates is documented here.

Source.structuredAnalysis.entities object
{
	"iterateOver" : "string", // OPTIONAL: If specified, a metadata field (nesting supported using dot notation) which is looped over to generate calls with _value/_iterator/_index
	"disambiguated_name" : "string", // MANDATORY: String/script, the disambiguated name of the entity
	"actual_name" : "string", // OPTIONAL: String/script, the actual name of the entity if different to the disambiguated name
	"dimension" : "string", // MANDATORY: String/script: Must be/return one of "Who", "What", "Where"
	"type" : "string", // MANDATORY: String/script: It is recommended to use a type from the 
				// OpenCyc, AlchemyAPI, or OpenCalais ontologies, for compatibility with future Infinit.e features
 
	"linkdata": "string", // OPTIONAL: if present should return a comma-separated list of URLs (commas should be URL-encoded)
	"relevance" : "string", // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double
	"sentiment" : "string", // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double, by convention this is between -1.0 and 1.0.
	"frequency" : "string", // OPTIONAL: Must specify/return a long/string-parsable-into-a-long
	"geotag" : { // OPTIONAL: Format is identical to the docGeo format specified above
		"lat": "string", "lon": "string,
		"city": "string", "stateProvince": "string, "country": "string", "countryCode": "string
	},
	"ontology_type": "string", // OPTIONAL: String/script: Only used if geotag is specified: 
					// allows specification of the scale of the geographic entity (see below for useful link), defaults to "point"
	"useDocGeo": "boolean", // OPTIONAL: If true, uses any lat/long generated from the top level "docGeo" specification, defaults to false
	"creationCriteriaScript" : "string", // OPTIONAL: script: If populated, runs a user script function and if return value is false doesn't create the object
	"entities" : [ { ... } ] // If iterateOver is specified, use this array to create multiple entity types per iteration (NOW DEPRECATED, use dot notation in "iterateOver")
}

More information on the ontology type is provided here.

Associations

The association format this specification object generates is documented here.

Source.structuredAnalysis.events object
{
	"iterateOver" : "string", //  OPTIONAL: If specified as a list of entity types, steps over entities with matching types (again, lock step or combinatorially)
					// Can also specify a metadata field (nesting supported using dot notation), in which case they are looped over to generate calls with _value/_iterator/_index
	"entity1" : "string", //  OPTIONAL: String/script: In "iterateOver"/type cases, the disambiguated name of the entity type; otherwise using entity1_index is preferred.
	"entity1_index" : "string", // OPTIONAL: String/script: should return the 'disambiguated_name/type' string, must resolve to an entity or is discarded
	"entity2" : "string", // OPTIONAL: String/script: In "iterateOver"/type cases, the disambiguated name of the entity type; otherwise using entity1_index is preferred.
	"entity2_index" : "string", // OPTIONAL: String/script: should return the 'disambiguated_name/type' string, must resolve to an entity or is discarded
	"verb" : "string", // MANDATORY: String/script
	"verb_category" : "string", // MANDATORY: String/script
	"assoc_type" : "string", // MANDATORY: Must specify/return one of "Fact", "Event", "Summary" and be overridden (eg converted to summary if there is only 1 index)
					// (if left blank, the Structured Analysis Handler will auto-generate this field reasonably accurately based on the contents)
	"time_start" : "string", // OPTIONAL: String/script: Must specify/return a time in ISO date format ("yyyy-MM-dd'T'HH:mm:ss") or Javascript time format
	"time_end" : "string", // OPTIONAL: String/script: Must specify/return a time in ISO date format ("yyyy-MM-dd'T'HH:mm:ss") or Javascript time format
	"geo_index" : "string", // OPTIONAL: String/script: The entity index corresponding to the "geotag" below (or the Type in "iterateOver" cases)
	"geotag" : { // OPTIONAL: Format is identical to the docGeo format specified above
		"lat": "string", "lon": "string,
		"city": "string", "stateProvince": "string, "country": "string", "countryCode": "string
	},
	//(note the ontology_type for associations is always "point" - use geo_index to specify larger areas)
	"creationCriteriaScript" : "string", // // OPTIONAL: script: If populated, runs a user script function and if return value is false doesn't create the object
	"sentiment" : "string", // OPTIONAL: String/script: Must specify/return a double/string-parsable-into-a-double, by convention this is between -1.0 and 1.0.
	"associations" : [ { ... } ] // If iterateOver is specified, use this array to create multiple association types per iteration (NOW DEPRECATED, use dot notation in "iterateOver")
}