...
Code Block |
---|
$metadata.offense or ${metadata.offense} |
Other fields at the document top level ("$title", "$description", etc) can also be referenced this way
Note: When data is extracted and added to the Metadata object all field name are converted to lowercase.
Note: If the metadata field is an array, the above syntax grabs the first element only. To go deeper into arrays, javascript must be used.
Note: When iterating over entities or metadata (for either entity or association building), the "$" sign is relative to the iterator, not the document (eg the metadata object being looped over). However when iterating over metadata fields that are strings, then the above document-level referencing is still valid, or "$value"/"${value}" can be used to reference the value itself.
Document updates and metadata
Existing documents can be updated in a number of different cases:
- Files can be updated (changing their "modified time")
- For RSS feeds/URLs, the source parameter "updateCycle_secs" will periodically update the file.
- Database sources can be updated as the result of a SQL call.
When a document is updated it is essentially equivalent to deleting and the re-creating it, except that its "_id" field is preserved). The Structured Analysis Harvester provides a mechanism to do the following useful activities:
- Preserve metadata from the old document (eg so the entities/associations can be recreated)
- Generate new metadata (and thence entities/associations) based on the differences between successive documents.
A script can be placed into ("onUpdateScript" - note the "$SCRIPT" convention used in entity/association scriptlets is not required here). This script has access to the following Javascript objects:
- "_old_doc": The document object that is about to be deleted
- "_doc": The newly created document object after all metadata/entity/association creation.
The last evaluated expression in the script (eg you don't "return val;" you just end the script "val;"), which can be a string, an object, or an array of objects is placed in a metadata field called "_PERSISTENT_". For example the following code just saves the entirety of the old document's metadata:
Code Block | ||
---|---|---|
| ||
// SOURCE CONFIG:
"structuredAnalysis": {
"scriptEngine": "javascript",
"onUpdateScript": "var retVal = _old_doc.metadata; retVal;"
}
// RESULT (IN THE CASE OF A DOCUMENT THAT DOESN'T CHANGE):
{
// Usual document fields
"metadata": {
"test1": "test",
"test2": { "field": "value" },
"_PERSISTENT_": [{
"test1": "test",
"test2": { "field": "value" },
}]
}
} |
And the following script shows a very simple example of comparing the old and new documents:
Code Block | ||
---|---|---|
| ||
"structuredAnalysis": {
"scriptEngine": "javascript",
"onUpdateScript": "var delta = _old_doc.metadata.length - _doc.metadata.length; var retVal = { 'delta': delta }; retVal;"
} |