Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Harvesting in Infinit.e is controlled by JSON documents called sources. These sources can be tested by POSTing to the Config - Source - Test REST endpoint, and activated/updated ("published") by POSTing to the Config - Source - Save REST endpoint.

In practice the Source Manager GUI can be used to perform these activities in a more visual intuitive way. It still requires building the source JSON with limited development support - as can be seen from the documentation here, this requires some javascript skills and some effort. The source manager provides some templates to get up and running on simpler types of ingest, and there is a source gallery with some real world examples of various complexities.

(In addition, our enterprise offering provides a visual "ETL" tool)

Quickly importing sources using the Chrome extension

...

the Chrome extension

For pulling public RSS feeds and HTML pages we provide a Chrome extension that gives a "1-click" import capability. This is described here.

Enrichment and entity extraction

One augmentation feature that is provided by Datasift and is therefore not applied to data imported via sources is the entity extraction provided by Salience. XXX

  • TextRank: XXX
  • AlchemyAPI: XXX
  • OpenCalais: XXX

XXX Note that theseĀ 

TODO something about entity generation (salience not available via public API, though it is via our enterprise edition - so there will be a disambiguation problem between different entity formats and types - can address some of this via alias builder)

...