Understanding the code

Overview

These 2 sections give a brief top level introduction into how you would follow the code to understand the lower level design of Infinit.e.

Understanding the object model

This JSON format includes specifications for all the relevant objects.

The POJO definitions are all contained within infinit.e.data_model. This library has a particular package format:

  • store contains objects that persist in MongoDB
  • index contains objects that persist in ElasticSearch
  • api contains objects that pass transiently through the (RESTlet) API
  • Also:

"store" and "index" sub-packages map onto the collections ("tables") described here (final diagram, reproduced below). The names are not exact but are close enough it should be obvious. "api" sub-packages map (closely) to the REST structured described here.

Within a sub-package, there are a small number of classes, the JSON specifications of which are described here.

As an example, documents persist in MongoDB, hence reside under "store.document". Because they can be passed out via the API and mirrored in the real-time index, there are also "transformer" classes in "index.document" and "api.knowledge".

Understanding the data flow

Various data flows are shown here (3rd diagram, reproduced below). The packages map onto the libraries here.

For example, once documents get to the end of the source pipeline (packge harvest.library, controller HarvestController - see the data flow here), they are passed to the generic processing package, where a few things happen:

Note that the details of the Lucene format is relatively uninteresting, because it is completely driven by elasticsearch's implementation of a Lucene engine, and is very abstracted away from almost all developers and users via the query object.

 

Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons