Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

Overview of the Infinit.e Data Harvesting Process

The Infinit.e platform features a robust set of data harvesters that give Infinit.e a powerful ETL (Extract, Transform and Load) capability. Infinit.e's harvesters are designed to consume data from a variety of sources and media types including:

  • Web based content accessible via URL including:
    • Static HTML content;
    • RSS and ATOM based news feeds;
    • Restful web services interfaces.
  • Traditional relational database management systems (RDBMS) via Java Database Connectivity (JDBC) drivers;
  • Files located on local and network attached storage devices.

Need pretty graphic for these steps:

  1. Extract data from source
  2. Create feed document from source data
  3. Enrich source data by extracting entities, events, geographic/location data, etc.
  4. Update entity counts/aggregates
  5. Store finished within Infinit.e's MongoDB data store

Creating a Source

The following WIKI pages describe in further detail the steps...

  1. Specifying a data source
    1. Extracting data from an RSS feed
    2. Extracting data from a database
    3. Extracting data from a file
  2. Structured Analysis - Overview
    1. Specifying Document Level Geospatial Information
    2. Specifying Entities
    3. Specifying Events
    4. Transforming Data with JavaScript
  3. Performing Unstructured Analysis on a source

Source Reference Documents

Source Document Specification:
Source APIs:
  • No labels