Knowledge - Source

Overview of the Infinit.e Data Harvesting Process

The Infinit.e platform features a robust set of data harvesters that give Infinit.e a powerful ETL (Extract, Transform and Load) capability. Infinit.e's harvesters are designed to consume data from a variety of sources and media types including:

Web based content accessible via URL including:
- Static HTML content;
- RSS and ATOM based news feeds;
- Restful web services interfaces.
Traditional relational database management systems (RDBMS) via Java Database Connectivity (JDBC) drivers;
Files located on local and network attached storage devices.

Need pretty graphic for these steps:

Extract data from source
Create feed document from source data
Enrich source data by extracting entities, events, geographic/location data, etc.
Update entity counts/aggregates
Store finished within Infinit.e's MongoDB data store

Creating a Source

The following WIKI pages describe detail the steps involved with creating sources:

Specifying a data source
How to specify the mechanics required to extract data from a source system:
1. Using the Feed Harvester
2. Using the Database Harvester
3. Using the File Harvester
Structured Analysis - Overview
An introduction to the Structured Analysis Harvest and how to specify the methods for enriching structured data sources with geographic information, entities, and events.
Unstructured Analysis - Overview

Source Reference Documents

Source Document Specification:

Source configuration objects

Knowledge - Source

Overview of the Infinit.e Data Harvesting Process

Creating a Source

Source Reference Documents

Source Document Specification:

Source APIs: