Infinit.e - OSS roadmap

Overview

This section describes some of the highest priority items on our roadmap. The main development effort will concentrate on new features that support functionality required or requested by our user base; a secondary thread will focus on ensuring that our technology stack is up-to-date; and a third thread will be more speculative and research-driven.

The remainder of this page partitions roadmap items into two sections:

  • Functionality: enhancements that immediately improve the user or operator's experience, normally mapping onto the first thread described above, and sometimes the third.
  • Framework: enhancements to the stack or APIs, normally mapping onto the secondary thread described above - though often of course unlocking the potential for new functionality.

Functional roadmap items

The roadmap items are further sub-divided architecturally: 

Harvesting
  • Support for deep web harvesting.
  • Support for "spidering out" from specified URLs, using Nutch.
  • Support for harvesting documents returned from common search engines (eg Google).
Visualization
  • Query chaining.
  • Multiple workspaces.
  • Entity-centric view of query datasets.

Design work for new visualization functions

Social
  • Better support for sharing queries and other collaborative working.
  • Provide a community activity feed.
  • Allow comment threads on documents and other artifacts.
User Interface
  • Provide a GUI for easily adding new sources, including all of the available regex and javascript customization functionality.
  • Integrate the different monitoring tools for the different technologies.
Entity extraction
  • Pull out and contextualize temporal terms ("next week", "January").
  • Integrate with AlchemyAPI's latest features (concept tagging, entity associations including directed sentiment, document classification, statistical keyword etc).
  • Provide a built-in statistical keyword extractor to provide some capability when offline.
  • Closer integration with the OpenCyc ontology.
Geospatial support
  • Polygon searching, including against named polygons (eg country boundaries).
  • Support for community-based adding of custom geographical features.
Querying/scoring
  • Ability to aggregate over user-specified metadata fields.
  • Ability to return "moments" (daily/weekly/monthly summaries of entity activity).
  • Source-specific weighting of document/entity scores (manual and automated).

Framework roadmap items

The roadmap items are further sub-divided architecturally: 

Basic infrastructure
  • Support for running (primarily installing) Infinit.e in Ubuntu and Redhat 6 (and CentOS equivalent).
  • Automated test infrastructure (eg to support community contributions better).
API framework
  • Support for SSL (particularly on REST calls containing passwords and other sensitive information).
  • Support for API keys in addition to cookie-based authentication.
Generic processing framework
  • Integrate a Graph database (to allow queries like "return all documents equal to or less than 2 hops from this entity".
  • Integrate Infinit.e with a prediction engine (eg Google's).
Custom processing framework
  • Explicit support for running Mahout on the existing Hadoop-based custom processing engine.
  • Productionize (existing) SQL database support for fusing structured and unstructured data.
Index/data store
  • Dynamically allocate new Lucene indexes to handle data growth more robustly.
  • Finish support for MongoDB sharding.
Harvesting and enrichment framework
  • Replace existing custom framework with UIMA compliant processing chain.
  • Use Nutch for link harvesting.

 

Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons