Overview

The "harvest post processor" custom plugin enables users with appropriate permissions to modify existing documents and overwrite them seamlessly with the new versions.

It can be used for a number of use cases, eg:

Basic operation

The idea is to start with a template version from the custom GUI (create one if it doesn't already exist - all that is needed is: MapperClass: "com.ikanow.infinit.e.hadoop.processing.InfiniteProcessingEngine$InfiniteMapper", CombinerClass/ReducerClass: "none", Key/Value class Text and BSONWritable respectively), and then:

{
	"rebuildAllCommunities": boolean, // Optional, defaults to false
	"debugMode": boolean, // Optional, defaults to false
	"processingPipeline": [
		{ /* standard source pipeline objects */ }
	]
}

where:

Advanced

This section describes any harvest-post-processor-specific issues with using pipeline elements:

Output

If "debugMode" is set to true, then all documents are output in their entirety to the standard custom output collection (and nothing is modified). The documents are all output with the same key, "modifiedDocument". (eg this can be used as the "query" term when browsing the output eg using the custom / get API call, or in the Record Analyzer if "$output.indexMode" is set to custom in the query. Deleted documents are output as "deletedDocument".

Regardless of the debugMode setting, the following output record types (set in the "key" field, as above) can be generated: