Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In practice the Source Manager GUI can be used to perform these activities in a more visual intuitive way. It still requires building the source JSON with limited development support - as can be seen from the documentation here, this requires some javascript skills and some effort. The source manager provides some templates to get up and running on simpler types of ingest, and there is a source gallery with some real world examples of various complexities.

(In addition, our enterprise offering provides a visual "ETL" tool)

...

One augmentation feature that is provided by Datasift and is therefore not applied to data imported via sources is the entity extraction provided by Salience. XXXThe Infinit.e platform provides the following Enrichment enginesalternatives:

  • TextRank: XXX
  • AlchemyAPI: XXX
  • OpenCalais: XXX

XXX Note that these 

TODO something about entity generation (salience not available via public API, though it is via our enterprise edition - so there will be a disambiguation problem between different entity formats and types - can address some of this via alias builder)

Adding users and communities

TODO

Updating the software

...

  • Extracts keywords similarly to Salience/Datasift (though less well)
  • (connector) AlchemyAPI: You can register for an API key with AlchemyAPI and use their service, which is integrated into Infinit.e. AlchemyAPI have a free tier allowing 1000 transactions/day. This connector pulls named entities only by default, but does include sentiment.
  • (connector) AlchemyAPI-metadata: This is another connector to AlchemyAPI, which provides keywords but no entity extraction - it is best used for short/badly formatted sources like twitter.
  • (connector) OpenCalais: OpenCalais is an alternative to AlchemyAPI - it focuses on business and politics, and doesn't have sentiment but does provide "business associations" (takeover rumors, that sort of thing). It has a significant free tier, offering 50,000 transactions per day once you register for an API key.

Note that these entity extractors all have different ontologies, eg their types are slightly different and their "disambiguation formats" also ("State" vs "StateOrProvince"; "Paris, Texas" vs "Paris, Texas. USA"), which is not ideal for combining with the built-in Salience augmentation since the same entity will appear in different forms. The entity aliasing function can be used to clear up some of these issues (eg for important entities; or a custom job could generate aliases automatically based on extracted data using some simple heuristics).

(Note that Salience does have a SaaS version, called Semantria, which offers a one time free 10,000 transaction usage. We have not built a connector to Semantria (or used it in any way), though it would be easy enough for us or another developer to do).

(Note also that our enterprise offering provides the same Salience NLP engine that Datasift uses, which would enable external sources to be integrated seamlessly with Datasift's social media)

Adding users and communities

TODO

Updating the software

There are two separate components installed on the Amazon image:

  • The Infinit.e community platform (September 2013 release)
  • Additional widgets and web services that provide the connection to Datasift plus other functionality such as entity alias manipulation (this is actually a subset of our enterprise offering).

To update the core platform, SSH into the instance and then follow the instructions provided here.

There is currently not automated way to upgrade the additional components. Should patches be required, we will update the Amazon image and also provide instructions to existing customers on how to obtain the latest binaries and update their existing images.