Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Section
Column
width75%

Welcome to the Infinit.e landing page! This sub-site is intended for integrators, developers, IT staff, technical analysts, researchers and similar roles, who want a technical overview of the platform and technical information about how to install, configure, exploit, integrate, or extend it.

For a higher level view, check out the following links:

In the spirit of the sort of analysis we would like to support, we will provide the remainder of this overview using the "5 Ws".

Column
width25%

...

Anchor
WhoAnchor
WhoAnchor
Who?

We are Ikanow, the developers of Infinit.e, the first Open Source document analysis platform. Our vision is XXX. 

...

Anchor
WhatAnchor
WhatAnchor

What?

Infinit.e is a framework for collecting, storing, processing, retrieving, analyzing, and visualizing unstructured documents and structured records. 

Gliffy
size500
nameInfinite_topLevel
pageid8519805

Click on zoom button above to expand

Let's provide some clarification on each of the often overloaded terms used in that previous sentence:

  • It is a "framework" (or "platform") because it is configurable and extensible by configuration (DSLs) or by various plug-ins types - the default configuration is expected to be useful for a range of typical analysis applications but to get the most out of Infinit.e we anticipate it will usually be customized.
  • By "unstructured documents" we mean anything from a mostly-textual database record to a multi-page report - but Infinit.e's "sweet spot" is in the range of database records that would correspond to a paragraph or more of text, through web pages, to reports of 10 pages or less. Smaller "structured records" are better handled by structured analysis tools (a very saturated space), though Infinit.e has the ability to do limited aggregation, processing and integration of such datasets. Larger reports can still be handled by Infinit.e, but will be most effective if broken up first.
  • By "processing" we mean the ability to apply complex logic to the data. Infinit.e provides some standard "processing", such as extraction of entities (people/places/organizations.etc) and simple statistics; and also the ability to "plug in" domain specific processing modules using the Hadoop API.
  • By "retrieving" we mean the ability to search documents and return them in ranking order, but also to be able to retrieve "knowledge" aggregated over all documents matching the analyst's query.
    • By "query"/"search" we mean the ability to form complex "questions about the data" using a DSL (Domain Specific Language).
  • By "analyzing" we mean the ability to apply domain-specific logic (visual/mathematical/heuristic/etc) to "knowledge" returned from a query.

TODO something about the generic data model and semi-structured data

(TODO links to diagrams from presentations with pretty pictures of unstructured analysis)

Why...

Anchor
WhyAnchor
WhyAnchor

Why...

... did we build it?

While supporting information analysts for the military and Government, we observed that the landscape of professional analysis tools is dominated by expensive proprietary products with limited flexibility, vendor lock-in, and requiring extensive and continuing  customization by expensive consultants.

Further, these tools had often originally been designed to analyze and mine structured records, whereas increasingly data is generated in a mix of unstructured documents and traditional structured records. Usually, unstructured documents dominate structured records in terms of readily available intelligence to be gleaned.

So we believed there was a gap in the market, if it could be filled.

We also observed that the Open Source community was developing tools that provided many of the core functions needed for an unstructured document-centric analysis tool (storage, search, aggregation, analytic frameworks). This provided some exciting new opportunities:

  • To lower the cost of developing an analytic platform, eliminating traditional cost barriers to its use, and freeing effort to focus on domain and analyst specific functionality and process
  • To helps the platform remain up-to-date, since the OSS tools are in continuous development and the ecosphere in general is very active. 
  • Finally, to foster an active open community of developers and users, in the image of the OSS projects on which the platform is built.

 Based on these needs and opportunities, we built Infinit.e, the first Open Source document analysis platform, using great OSS projects like Lucene, elasticsearch, Hadoop, MongoDB, tomcat, and many others. Our objectives are:

  • To provide, maintain, and enhance a document analysis platform based on the currently best available mature Open Source projects
  • To continue to develop the functionality based on analyzing and abstracting real users' requirements
  • To make the platform's source available to everyone under standard Open Source licenses.
  • To provide a simple but powerful front end to enable and demonstrate all the platform's most important capabilities.
  • To help organizations TODO
  • To build more focused, smaller scale, applications using modified versions of the platform (eg for performance/scale vs functionality)

Gallery

Key OSS technologies used in Infinit.e

... might you want to use it?

TODO

...

Anchor
WhenAnchor
WhenAnchor

When?

Starting with prototypes developed at MTCSC (since acquired by Mantech), we started development of the tool in November 2010, focusing on adopting the best OSS tools for the tool's different functions, productionizing their use, and glueing them together in a logical way.

We released the first Open Source version in March 2012, mostly under the Affero GPL license (with plugins and utilities mostly released under the Apache 2.0 license).

...

Development continues: check out our TODO LINK roadmap.

...

Anchor
HowAnchor
HowAnchor

 

How...

  • ... to learn more about the platform: click here for further details about the architecture.
  • ... to download and install the platform: click here. TODO
  • ... to import data into the tool and perform basic analysis: click here. TODO
  • ... to develop and install plugins and visualizations: click here. TODO
  • ... to integrate with other platforms: Infinit.e contains a rich and open REST API, described here. The API documentation includes "tutorials" (here) on performing many common operations using the API.
  • ... to download and develop core infrastructure: the Open Source repository is here, information about building and developing the code is here. TODO