Introduction

Welcome to the IKANOW Community Edition (Infinit.e) landing page! This sub-site is intended for integrators, developers, IT staff, technical analysts, researchers and similar roles, who want a technical overview of the platform and technical information about how to install, configure, exploit, integrate, or extend it.

For a higher level view, check out the following links:

In the spirit of the sort of analysis we would like to support, we will provide the remainder of this overview using the "5 Ws".

Who?

We are IKANOW, the developers of Infinit.e, the first Open Source document analysis platform. Our vision is to enable agile intelligence through open analytics.

What?

IKANOW Community Edition is a scalable framework for collecting, storing, processing, retrieving, analyzing, and visualizing unstructured documents and structured records. It is built with IKANOW Infinit.e, Hadoop, Elasticsearch, and MongoDB.


 

Click on zoom button above to expand

Let's provide some clarification on each of the often overloaded terms used in that previous sentence:

We refer to the processing/retrieval/analysis/visualization chain as document-centric knowledge discovery:

One important aspect of the Infinit.e is our generic data model. Data from all sources (from large unstructured documents to small structured records) is transformed into a single, simple. data model that allows common queries, scoring algorithms, and analytics to be applied across the entire dataset. The following diagram (click zoom to expand) illustrates this:

The following thumbnails (click to expand) show a complex query being built, and an examples of visualizing the knowledge encoded in documents returned from the query. Note that although the screenshots show our webapp being used, in practice the Infinit.e platform can be integrated with any open front-end application or analytical chain.

Why...

... did we build it?

While supporting information analysts for the military and Government, we observed that the landscape of professional analysis tools is dominated by expensive proprietary products with limited flexibility, vendor lock-in, and requiring extensive and continuing customization which ends up extremely expensive and inefficient.

Further, these tools had often originally been designed to analyze and mine structured records, whereas increasingly data is generated in a mix of unstructured documents and traditional structured records. Usually, unstructured documents dominate structured records in terms of readily available intelligence to be gleaned.

So we believed there was a gap in the market, if it could be filled.

We also observed that the Open Source community was developing tools that provided many of the core functions needed for an unstructured document-centric analysis tool (storage, search, aggregation, analytic frameworks). This provided some exciting new opportunities:

Further, the increasing richness and availability of low cost SaaS and PaaS cloud services meant that great functionality, like NLP, and great computing performance scalability using platforms like EC2, were affordably available for smaller organizations.

Based on these needs and opportunities, we built Infinit.e, the first Open Source document analysis cloud platform, using great OSS projects like Lucene, elasticsearch, Hadoop, Mahout, MongoDB, tomcat, and many others. Our objectives are:

         

Key OSS and cloud technologies used in Infinit.e

... might you want to use it?

The platform is intended to support any activity where analyzing and synthesizing large volumes of data can provide a benefit; and particularly where the conclusion is sufficiently complex that the human brain is still needed, but the volumes of data are high enough that the brain could use some help!

Specific examples include, but are not limited to:

Illustrations of the diverse uses of knowledge discovery:


Market research using Infinit.e:

Analyzing crime statistics in DC

Due diligence (longer; using the enterprise edition)

When?

Starting with prototypes developed in early 2010, we started the main development of the tool in November 2010, focusing on adopting the best OSS tools for the tool's different functions, productionizing their use, and glueing them together in a logical way.

We released the first Open Source version in March 2012, with the core code mostly licensed under the Affero GPL license, and the API licensed under the Apache 2.0 license (together with plugins and utilities, and with one core library: the data model, which provides useful classes and utilities for serialization and deserialization of commonly-used JSON objects).

Development continues: check out our roadmap.

How...