Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

JSON format

Note that there is a separate overview of using the Structured Analysis Harvester. This page is reference information.

...

Code Block
languagejavascript
titleSource.structuredAnalysis object
{
	"scriptEngine" : "string", // OPTIONAL: String, Infinit.e currently only supports "javascript" (or "JavaScript"), which is the default 
	"script" : "string", // OPTIONAL: String, can contain one or more JavaScript functions,
				// i.e. "function func() { var foo = 'test'; return foo; }"
	"scriptFiles" : [ "string" ], // OPTIONAL: Array of Strings, URLs of JavaScript
					// files to import at runtime
    "caches": { "string": "string", ... } // A list of caches in the format <CACHE_NAME>:<ID> where <ID> is the "_id" of a JSON share, see overview
 
	"title" : "string", // OPTIONAL: String, else document title is whatever is generated by the harvester (eg from RSS/filename)
	"fullText" : "string", // OPTIONAL: String, else full text is taken from the document contents as per usual. 
	"description" : "string", // OPTIONAL: String, else document description is whatever is generated by the harvester (or an entity extractor if supported)
	"displayUrl": "string", // OPTIONAL: String, this field is just used for display
	"publishedDate" : "string", // OPTIONAL: String, must return a date string in a standard format (eg Java, Javascript, ISO, SMTP, MM/dd/yy, MM/dd/yyyy etc)
					// If not present, published data either comes from harvester (eg created date for files), or is the current time
 
	"entities" : [ { ... } ], // OPTIONAL: to create entities from the metadata (see below)
	"associations" : [ { ... } ], // OPTIONAL: to create associations (events/facts/summaries) from the metadata (see below)
	"docGeo" : { // (OPTIONAL, to specify the document geo tag)
		// ONE OF THE FOLLOWING 2 SETS OF FIELDS:
		// Specify directly:
		"lat" : "string", // latitude
		"lon" : "string", // longitude
 
		// Or fill in as many search options as possible, if a match can be found it populates the lat/long
		"city" : "string", // String
		"stateProvince" : "string", // String
		"country" : "string", // String
		"countryCode" : "string" // String
	},
 
	"rejectDocCriteria": "string", // OPTIONAL: String, an optional script that returns null to keep the document, any string to reject the doc (the string is logged)
	"metadataFields": "string", // OPTIONAL: String, if present a comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'),
					// or only include (starts with '+', default) - the fields are deleted after all processing but before indexing and storage.
					// In addition for "-" only, nested objects can be deleted using dot notation, eg "json.object.nested.field"
	"onUpdateScript": "string" // OPTIONAL: Used to preserve existing metadata when documents are updated, and also to generate new metadata based on the differences 
					// between old and new documents. This script is discussed further in the Structured Analysis Overview linked at the top of this page
}

...