Table of Contents |
---|
UNDER CONSTRUCTION
Suspending and deleting sources
...
More complex analytics and visualization
VIDEO COMING SOON!
UNDER CONSTRUCTION (Note the functionality is already present in the AMI, we just need to write the video and associated documentation.the documentation in this section just needs to be completed)
Overview
In previous sections we have seen how the query function returns a subset of the matching documents, together with some basic averaging statistics, and how this is sufficient for many standard data driven investigations.
In other cases, particularly as your activities move from data investigation to data science, it becomes necessary either to apply either more complex algorithms (for example graph theory or social network analysis), or to calculate standard statistics in domain-specific (a very simple example of this would be aggregating sentiment geographically).
TODO screenshots (sentiment + geo -> custom geo)
In order to support these sorts of operations, Infinit.e provides the ability to plug-in analytic modules that can run over any subset of the data (including all of it). The general topic of building plug-in modules and scheduling and running them is beyond the scope of this documentation; this section will provide links to the Infinit.e documentation and describe the aspects most relevant to Datasift.
In particular, we have provided 3 sample jobs that illustrate a few different types of analytic and demonstrate how to access the document objects (in practice the Infinit.e-specific bits like this are very easy, the difficulty is typically in building the algorithms themselves, as it should be).
Example 1 - XXX
TODO
Example 2 - XXX
TODO
Example 3 - XXX
TODO
Creating new Javascript plug-ins
TODO
Creating new Hadoop plug-ins
TODO
Visualizing the output of plug-ins
TODO
Further reading:
- Plugin manager documentation
- Information about the built-in Javascript engine
- Developer information about building Java Hadoop plugins
- An IKANOW blog post discussing using jsfiddle to visualize custom analytics
- (contains links to some other relevant blog posts about running analytics on Infinit.e datasets, including this one about doing temporal/sentiment analytics on emails)
...