...
Suspending and deleting sources
Widget Connector | ||
---|---|---|
|
As outlined in the video above, once Datasift sources have been created, as detailed here (video and documentation), then they can be controlled from the source editor.
The following operations can be performed:
...
To suspend an active source, simply navigate to the source editor tab of the manager, select the desired source from the list on the left, and select the "Disable Source" button (clicking "OK" to confirm publishing the source when prompted).
...
To delete a source follow the steps from the source editor documentation. Deleting a source will automatically delete the corresponding Datasift push subscription.
...
If it is necessary to make minor modifications to the JCSDL then this can be done without deleting and re-creating the source. Instead, suspend the source, wait two minutes (to be sure the existing Datasift push subscription has been terminated), modify the JCSDL in the subscription (this is one of the fields in the source editor), and then activate it (which will automatically publish the change).
Info |
---|
If it is requested enough then we will add a graphical editor into the Datasift connector widget so as to be able to modify sources more formally. |
Further reading:
Creating aliases and discarding unwanted entities
Widget Connector | ||
---|---|---|
|
...
Info |
---|
One downside to this is that there is no way within the widget of transferring aliases from the sandbox to a real community (eg once you are happy with them). However the sub-section "Manually setting alias configurations" explains how this can be achieved easily using the File Uploader page instead. |
Creating synthetic alias masters
...
As with all Infinit.e GUI functionality, the Entity Alias Builder widget is just an interface to our open API.
Aliases are stored in Infinit.e as JSON shares of type "infinite-entity-alias". Their format is described here. They can be manually uploaded and shared between communities using the File Uploader manager page.
This can be useful for 2 purposes:
- Where there are large numbers of aliases to be generated, it would not be much fun to use the GUI for each one. Instead you can programmatically generate (eg with a script) a JSON file containing the aliases and then upload it.
- This is a bit beyond the scope of this documentation, but you can also create a plugin (eg using the Javascript scripting engine) and then create a share with type "infinite-entity-alias" that points to the custom plugin results (this is described in the File Uploader documentation).
- So as an example if you have a word document that lists lots of social media handle mappings, then you could upload that as a share, then import that share as a source (this is discussed further below under "Importing other sources"), then write a Javascript plugin (see below under "More complex analytics") that parses the document into the right format, and then finally point a share to that!
- This would have the nice feature that it would automatically update itself whenever the document was re-uploaded.
- So as an example if you have a word document that lists lots of social media handle mappings, then you could upload that as a share, then import that share as a source (this is discussed further below under "Importing other sources"), then write a Javascript plugin (see below under "More complex analytics") that parses the document into the right format, and then finally point a share to that!
...
Info |
---|
A few things to note when using aliases in multi-user environments:
|
...
- An IKANOW blog post discussing an operational use of aliasing
- File Uploader documentation
- Community Manager documentation
- The alias API
More complex analytics and visualization
...
The sequence of the screenshots below shows how to access the example (starting from the manager webapp, eg press the "MANAGER" link in the top right of the main GUI):
TODO The key screenshot above is the middle one, which shows the "scary-at-first-glance" plugin manager. This page TODO
Example 2 - Aggregate sentiment by gender
TODO
Example 3 - Show top co-references
TODO
...
manager is documented in more detail here, we'll focus for now on the following components:
- The dropdown menu at the top lists available tasks. Selecting one fills in the rest of the form, as shown.
- The "QuickRun" or "Save and Debug" buttons save the current settings and run the job (on a subset of the records in the latter case). You can see from the status message that (on 8.5K records), the job took ~19s to complete and generated 44 aggregated records.
- (both these options don't reload the page until the job has completed - to run the job asynchronously you can use "Submit" instead, this is dicussed below).
- Once the job has run you can view the results in a separate tab by pressing the "Show results" button.
- Note that this new tab is just directly using the Custom - Get Results RESTful API call, and unless your browser is configured to render JSON will not be nicely formatted - we use the Chrome/Firefox extension JSONView.
Regardless of how nicely formatted the JSON is, in practice it is preferable to have a graphical view of the resulting data. The Infinit.e application comes with two widgets for this purpose:
- Custom Viewer - Map: Finds fields in the record that "look like" lat/long points and plots them on a map (MapQuest) colored according to a score defined by a numeric field in the same record (see below for more details)
- The rules for "looks" like lat/long are as follows:
- Is a top-level object called "geo", "geotag", "latlong", "latlon" AND consists of 2 numeric field (or strings representing numbers) with names "lat" and "lon" or "latitude" and "longitude"
- Has two top level numeric fields (or strings representing numbers) with names names "lat" and "lon" or "latitude" and "longitude"
- The rules for "looks" like lat/long are as follows:
- Custom Viewer - Bar Graph: Uses any field from the record as a key, and plots a bar of height defined by a numeric field in the same record (see example 2 for more details).
In addition we have successfully used the free jsfiddle service to visualization analytics - see this blog post for more details.
For this example, the "Custom Viewer - Map" is the obvious choice. The screenshot below shows the different options from the header.
- The first dropdown menu selects the plug-in from which to take the results.
- (The selection will fail if the widget cannot detect any fields that look like lat/long. This is discussed below, under "Visualizing the output of plug-ins")
- The second menu allows the user to select which field determines the color of the plotted points (from the palette of green/blue/orange/red).
- (This job has generated two numeric fields, the aggregated sentiment, and the number of records containing sentiment)
- The third menu determines how the score field is converted into a color:
- Linear scale: the lowest score is green, the highest score is red, the buckets are distributed evenly from min to max.
- Log scale: the lowest score is green, the highest score is red, the buckets are distributed logarithmicly from min to max.
- Polarity: Red is negative, Green is positive, Blue is neutral (less than 10% of the max in either direction).
Returning back to the plugin manager, there were two larger text fields:
- "Query" field: together with the "Communities" list, this controls what data is processed
- "User arguments" field: in this case this is actually the code that is run over the data. This is because it is a Javascript plugin, see below under "Creating new Javascript plug-ins".
- (Note that for Hadoop JARs this provides generic configuration parameters, see below under "Creating new Hadoop plug-ins")
In this case we can see that the query is:
Code Block | ||
---|---|---|
| ||
{"docGeo":{"$exists":true}}
//^^ (ie only process geo-tagged tweets, eg from cellphones) |
There are a few points to note here:
- The overall syntax of the query is that of MongoDB
- There are some additional extensions starting with "$": these are documented here, and can be inserted either manually of by by pressing the "Add Options" button that is next to the query.
- The document fields that the query is applied against are described here.
- You can view the JSON format of a given document from the "Document Browser" widget, as shown in the screenshot below.
As an example, say you wanted to query on only records that were tagged by datasift with gender "Male". There would be two ways of doing this:
Code Block |
---|
//Option 1, simplest (see datasift documentation for their metadata format):
//TODO
//Option 2, most generic:
//TODO |
The advantage of Option 2 would be that if you later imported other sources that had a "Gender" entity but weren't from datasift (eg had a different metadata format), then you would not have to alter your queries.
TODO example - changing the query
TODO map/reduce code
Example 2 - Aggregate sentiment by gender
TODO
Example 3 - Show top co-references
TODO
Creating new Javascript plug-ins
TODO
Creating new Hadoop plug-ins
...
Visualizing the output of plug-ins
TODO more details on how to format fields to be usable
TODO the advanced option
TODO jsfiddle
Further reading:
- Plugin manager documentation
- Information about the built-in Javascript engine
- Developer information about building Java Hadoop plugins
- An IKANOW blog post discussing using jsfiddle to visualize custom analytics
- (contains links to some other relevant blog posts about running analytics on Infinit.e datasets, including this one about doing temporal/sentiment analytics on emails)
...
An important feature of the Infinit.e platform is that it wants data to be open: our User Interfaces and applications use our open RESTful API, so any other client can get the same data.
The primary method of getting at the data is via the query API call, and that linked page shows some examples of making the call in javascript and actionscript. In addition we have a beta (ie undocumented!) Java driver here (that we use internally, so is well supported). There are more, general, examples of using the API in different languages here.
...
Harvesting in Infinit.e is controlled by JSON documents called sources. These sources can be tested by POSTing to the Config - Source - Test REST endpoint, and activated/updated ("published") by POSTing to the Config - Source - Save REST endpoint.
In practice the Source Manager GUI can be used to perform these activities in a more visual intuitive way. It still requires building the source JSON with limited development support - as can be seen from the documentation here, this requires some javascript skills and some effort. The source manager provides some templates to get up and running on simpler types of ingest, and there is a source gallery with some real world examples of various complexities.
...
Creating new users can be performed from the Person Manager GUI.
Creating new communities can be performed from the Community Manager GUI.
A few points to note:
- After a community is created, only the owner is initially added. Other users can be added (or removed) by selecting the "Add New Members" (or "Edit/Remove Members") at the bottom of the right pane for the selected community.
- For users to be able to add new sources, they must either be system administrators, or be added as "Content Publisher" or better in their community role (right hand pane after selecting a user from the "Edit/Remove Members" page).
- (In secure mode, see below, users must be administrators to create new sources)
...