Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

UNDER CONSTRUCTION

Suspending and deleting sources

...

More complex analytics and visualization

VIDEO COMING SOON!

UNDER CONSTRUCTION (Note the functionality is already present in the AMI, we just need to write the video and associated documentation.the documentation in this section just needs to be completed)

Overview

In previous sections we have seen how the query function returns a subset of the matching documents, together with some basic averaging statistics, and how this is sufficient for many standard data driven investigations.

In other cases, particularly as your activities move from data investigation to data science, it becomes necessary either to apply either more complex algorithms (for example graph theory or social network analysis), or to calculate standard statistics in domain-specific (a very simple example of this would be aggregating sentiment geographically).

TODO screenshots (sentiment + geo -> custom geo)

In order to support these sorts of operations, Infinit.e provides the ability to plug-in analytic modules that can run over any subset of the data (including all of it). The general topic of building plug-in modules and scheduling and running them is beyond the scope of this documentation; this section will provide links to the Infinit.e documentation and describe the aspects most relevant to Datasift. 

In particular, we have provided 3 sample jobs that illustrate a few different types of analytic and demonstrate how to access the document objects (in practice the Infinit.e-specific bits like this are very easy, the difficulty is typically in building the algorithms themselves, as it should be).

Example 1 - XXX

TODO

Example 2 - XXX

TODO

Example 3 - XXX

TODO

Creating new Javascript plug-ins

TODO

Creating new Hadoop plug-ins

TODO

Visualizing the output of plug-ins

TODO

Further reading:

...