Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
$metadata.offense or ${metadata.offense}

Other fields at the document top level ("$title", "$description", etc) can also be referenced this way

Note: When data is extracted and added to the Metadata object all field name are converted to lowercase.

Note: If the metadata field is an array, the above syntax grabs the first element only. To go deeper into arrays, javascript must be used.

Note: When iterating over entities or metadata (for either entity or association building), the "$" sign is relative to the iterator, not the document (eg the metadata object being looped over). However when iterating over metadata fields that are strings, then the above document-level referencing is still valid, or "$value"/"${value}" can be used to reference the value itself.

Document updates and metadata

Existing documents can be updated in a number of different cases:

  • Files can be updated (changing their "modified time")
  • For RSS feeds/URLs, the source parameter "updateCycle_secs" will periodically update the file.
  • Database sources can be updated as the result of a SQL call.

When a document is updated it is essentially equivalent to deleting and the re-creating it, except that its "_id" field is preserved). The Structured Analysis Harvester provides a mechanism to do the following useful activities:

  • Preserve metadata from the old document (eg so the entities/associations can be recreated)
  • Generate new metadata (and thence entities/associations) based on the differences between successive documents.

A script can be placed into ("onUpdateScript" - note the "$SCRIPT" convention used in entity/association scriptlets is not required here). This script has access to the following Javascript objects:

  • "_old_doc": The document object that is about to be deleted
  • "_doc": The newly created document object after all metadata/entity/association creation.

The last evaluated expression in the script (eg you don't "return val;" you just end the script "val;"), which can be a string, an object, or an array of objects is placed in a metadata field called "_PERSISTENT_". For example the following code just saves the entirety of the old document's metadata:

Code Block
languagejavascript
// SOURCE CONFIG:
"structuredAnalysis": {
	"scriptEngine": "javascript",
	"onUpdateScript": "var retVal = _old_doc.metadata; retVal;"
}
// RESULT (IN THE CASE OF A DOCUMENT THAT DOESN'T CHANGE):
{
	// Usual document fields
	"metadata": {
		"test1": "test",
		"test2": { "field": "value" },
		"_PERSISTENT_": [{
			"test1": "test",
			"test2": { "field": "value" },
		}]
	}
}

And the following script shows a very simple example of comparing the old and new documents:

Code Block
languagejavascript
 "structuredAnalysis": {
	"scriptEngine": "javascript",
	"onUpdateScript": "var delta = _old_doc.metadata.length - _doc.metadata.length; var retVal = { 'delta': delta }; retVal;"
}
Further Reading