Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Overview

This toolkit element provides control over whether documents is stored, and which metadata fields (including special persistent fields across document updates).

Format

TODO convert to JSON

{
	"display": string,
	"storageSettings": {} // see under StorageSettingsPojo
}
//////////////////////////////////
 
	public static class StorageSettingsPojo {
		public String rejectDocCriteria; 	//OPTIONAL: If populated, runs a user script function and if return value is non-null doesn't create the object and logs the output.  *Not* wrapped in $SCRIPT().
		public String onUpdateScript; 		//OPTIONAL: Used to preserve existing metadata when documents are updated, and also to generate new metadata based on the differences between old and new documents. *Not* wrapped in $SCRIPT().
		public String metadataFieldStorage; //OPTIONAL: A comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'), or only include (starts with '+', default) - the fields are deleted at that point in the pipeline.
	}

 

Description

Filtering Metadata Objects

rejectDocCriteria provides a way to evaluate some data for a specific set of criteria.  If the return value is non-null (ie. the criteria has matched on some of the data) this returned data will not be used to generate entities, or associations. This parameter is a good way to create filters on the creation of metadata entities and associations.

The example source below shows how to populate this parameter with a script for filtering purposes.

In the example, used to analyze some data in a Twitter feed, the JSON object "link" would not be used as a metadata object for the creation of any entities or associations.  The object would simply be logged as part of the discovered metadata.

 

},        {
            "storageSettings": {
                "rejectDocCriteria": "$SCRIPT( if (null == _doc.metadata.json[0].link || null == _doc.metadata.json[0].object) return 'reject'; )"
            }
        }
    ]
}

 

It is also possible to use metadataFieldStorage a comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'), or only include (starts with '+', default) - the fields are deleted at that point in the pipeline.

 


Comparing New and Old Documents

It is possible to use onUpdateScript to configure the behavior of how documents will be updated.

Existing documents can be updated in a number of different cases:

  • Files can be updated (changing their "modified time")
  • For RSS feeds/URLs, the source parameter "updateCycle_secs" will periodically update the file.
  • Database sources can be updated as the result of a SQL call.

When a document is updated it is essentially equivalent to deleting and the re-creating it, except that its "_id" field is preserved).

Document storage settings provides a mechanism to do the following useful activities:

  • Preserve metadata from the old document (eg so the entities/associations can be recreated)
  • Generate new metadata (and thence entities/associations) based on the differences between successive documents.

onUpdateScript can be configured with a script, that will either preserve metadata from the old document, or create new metadata.

 The "$SCRIPT" convention used in entity/association scriptlets is not required here.

 

This script has access to the following Javascript objects:

  • "_old_doc": The document object that is about to be deleted
  • "_doc": The newly created document object after all metadata/entity/association creation.

The last evaluated expression in the script (eg you don't "return val;" you just end the script "val;"), which can be a string, an object, or an array of objects is placed in a metadata field called "_PERSISTENT_".

Preserve metadata example

The following code just saves the entirety of the old document's metadata:

 

 "onUpdateScript": "var retVal = _old_doc.metadata; retVal;"}
// RESULT (IN THE CASE OF A DOCUMENT THAT DOESN'T CHANGE):
{
    // Usual document fields
    "metadata": {
        "test1": "test",
        "test2": { "field": "value" },
        "_PERSISTENT_": [{
            "test1": "test",
            "test2": { "field": "value" },
        }]
    }
}

Generate New Metadata Example

In this example, the return value will represent the delta of the two documents under comparison.

"onUpdateScript": "var delta = _old_doc.metadata.length - _doc.metadata.length; var retVal = { 'delta': delta }; retVal;"}

Footnotes:

Legacy documentation:

Legacy documentation:

TODO

 

 

 

  • No labels