Overview
...
Once data is ingested into Infint.e from the various extractors it is stored in JSON format including its metadata fields and content. It also contains sub-objects such as entities and associations.
Infinit.e provides a variety of mechanisms by which documents can be updated over time. For example, the updateCycle_secs
field can be set on RSS sources to periodically update documents based on RSS feeds. You can configure the behavior of how Infinit.e stores documents and updates existing documents by using Document storage settings. Persistent fields can be called out which will remain intact across document updates.
Table of Contents |
---|
Format
Code Block |
---|
{ "display": string, "storageSettings": { "rejectDocCriteria":string,//OPTIONAL: If populated, runs a user script function and if return value is non-null doesn't create the object and logs the output. *Not* wrapped in $SCRIPT(). "onUpdateScript":string,//OPTIONAL: Used to preserve existing metadata when documents are updated, and also to generate new metadata based on the differences between old and new documents. *Not* wrapped in $SCRIPT(). "metadataFieldStorage"string,//OPTIONAL: A comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'), or only include (starts with '+', default) - the fields are deleted at that point in the pipeline. } } |
...
Field | Description |
---|---|
rejectDocCriteria | OPTIONAL: If populated, runs a user script function and if return value is non-null doesn't create the object and logs the output. *Not* wrapped in $SCRIPT(). |
onUpdateScript | OPTIONAL: Used to preserve existing metadata when documents are updated, and also to generate new metadata based on the differences between old and new documents. *Not* wrapped in $SCRIPT(). |
metadataFieldStorage | OPTIONAL: A comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'), or only include (starts with '+', default) - the fields are deleted at that point in the pipeline. |
Use Cases
The fields of the Document storage settings configuration can be used to support the following use cases
- Determine which metadata fields will be stored and used for creation of entities/associations
See examples below.
- Determine how documents will be updated
- Retain existing metadata/entities/associations
- Build new metadata/entities/associations
See examples below.
Examples
...
Anchor | ||||
---|---|---|---|---|
|
Metadata Field Storage
You can use metadataFieldStorage
to either include or exclude metadata fields from document creation. To use this field you must specify a comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'), or only include (starts with '+', default).
Filtering Creation of Entities and Associations
rejectDocCriteria
provides a way to evaluate some data for a specific set of criteria. If the return value is non-null (ie. the criteria has matched on some of the data) this returned data will not be used to generate entities, or associations. This parameter is a good way to create filters on the creation of metadata entities and associations.
...
Code Block |
---|
}, { "storageSettings": { "rejectDocCriteria": "$SCRIPT( if (null == _doc.metadata.json[0].link || null == _doc.metadata.json[0].object) return 'reject'; )" } } ] } |
It is also possible to use metadataFieldStorage
a comma-separated list of top-level metadata fields to either exclude (if "metadataFields" starts with '-'), or only include (starts with '+', default) - the fields are deleted at that point in the pipeline.
...
...
Anchor | ||||
---|---|---|---|---|
|
It is possible to use onUpdateScript
to configure the behavior of how documents will be updated.
...