Overview of the Infinit.e Data Harvesting Process
The Infinit.e platform features a robust set of data harvesters that give Infinit.e a powerful ETL (Extract, Transform and Load) capability. Infinit.e's harvesters are designed to consume data from a variety of sources and media types including:
- Web based content accessible via URL including:
- Static HTML content;
- RSS and ATOM based news feeds;
- Restful web services interfaces.
- Traditional relational database management systems (RDBMS) via Java Database Connectivity (JDBC) drivers;
- Files located on local and network attached storage devices.
Need pretty graphic for these steps:
- Extract data from source
- Create feed document from source data
- Enrich source data by extracting entities, events, geographic/location data, etc.
- Update entity counts/aggregates
- Store finished within Infinit.e's MongoDB data store
Creating a Source
The following WIKI pages describe in further detail the steps...
- Specifying a data source
- Extracting data from an RSS feed
- Extracting data from a database
- Extracting data from a file
- Performing Structured Analysis on a source
- Performing Unstructured Analysis on a source