...
- store contains objects that persist in MongoDB
- index contains objects that persist in ElasticSearch
- api contains objects that pass transiently through the (RESTlet) API
- Also:
- utils contains interfaces needed to build standalone Harvest extractor modules
- driver contains the (ever Beta) Infinit.e Driver
- custom contains Mongo/Infinit.e/Hadoop connector classes
"store" and "index" sub-packages map onto the collections ("tables") described here (final diagram, reproduced below). The names are not exact but are close enough it should be obvious. "api" sub-packages map (closely) to the REST structured described here.
Within a sub-package, there are a small number of classes, the JSON specifications of which are described here.
...
For example, once documents get to the end of the source pipeline (pacakge packge harvest.library, controller HarvestController - see the data flow here), they are passed to the generic processing package, where a few things happen:
...
Note that the details of the Lucene format is relatively uninteresting, because it is completely driven by elasticsearch's implementation of a Lucene engine, and is very abstracted away from almost all developers and users via the query object.