- andrew johnston (Unlicensed)
A common Manager use case is to generate structured data, starting from some Twitter data, e.g. as provided from either Gnip or DataSift.
Getting the Social Data Into The Platform
Using a service such as Datasift or Gnip, you can monitor for a specific time period, using a specific set of terms or other filtering criteria. The results can then be saved and exported in your data format of choice. JSON is a universally recognized format for the export and interchange of social data, and is the data format of preference for most services and the Ikanow platform.
After you have exported a JSON file from your preferred service, you can upload it to the Ikanow platform using a variety of methods.
- upload a JSON file to a share using the file uploader
- connect the platform to a Windows/Samba share
- connect to Amazon S3
For the purposes of example we will show the file uploader
To upload the file, using the file uploader
- Navigate to Manager>File Uploader.
- Fill in the fields of the File Uploader and Choose the file.
- Click on Submit.
Make sure you take note of the generated ID, which is displayed above after submission.
where id, is the alphanumeric character string displayed to the right of get/
Creating the New Source
Once the social data is loaded into the platform, you can create a new source.
There are a variety of source templates in the Source Manager which will work, when working with Twitter data. The template, and source configuration that you ultimately use will be based on your data and your intended purpose.
In this example we will use a simple source configuration, based on the Feed extractor.
To create the new source
- Navigate to Source Editor>New Source.
- Select the appropriate template: For example, "RSS Source".
- Click on Select.
- Fill in the remaining information, and ensure you select the correct Community.
- Click on Save Source.
You will be presented with the following source configuration, suitable for making a few simple adjustments.
{ "communityIds": [ "53add292e4b015f8f5817611" ], "description": "test rss", "title": "test rss", "processingPipeline": [ { "feed": { "extraUrls": [ {} ] } }, { "textEngine": { "engineName": "default" } }, { "featureEngine": { "engineName": "default" } } ] }
To obtain usable results quickly, simply specify the location of the JSON Twitter data and specify an automated extraction engine to get started.
The example below connects to a Windows Samba drive to get the Twitter data. The popular AlchemyAPI-metadata featureEngine is specified to generate Entities and Associations.
For more information about extractors, see section Source Pipeline Elements.
"title": "Super Storm Sandy - Twitter: 2012_10_26_02", "processingPipeline": [ { "feed": { "extraUrls": [ { "url": "smb://modus:139/datasift/sandy_demo/hourly/2012_10_26_02/" } ] } }, { "featureEngine": { "engineName": "AlchemyAPI-metadata", "engineConfig": { "app.alchemyapi-metadata.batchSize": "100", "app.alchemyapi-metadata.numKeywords": "5", "app.alchemyapi-metadata.strict": "true" } } } ] }
Testing the Source
Once you have provided the correct location of the Twitter data and created a preliminary configuration you can test the source to verify that documents are returned.
To test the source
- Click on Test Source. The platform will perform data processing and should then return the documents.
A Source Test Output window will open displaying either a success or error message.
If you have pop-blocking on, you will need to accept pop-ups in order to receive the source test output.
The code example below shows an example of the documents returned by the above source configuration.
{ response: { action: "Doc Info", success: true, message: "Feed info returned successfully", time: 69 }, data: { _id: "5266a332e4b00f80ca1cb3d7", title: "This is my 1st day on Twitter since #Sandy. If it's not your's, you weren't really affected by #Sandy.", url: "smb://modus:139/datasift/sandy_demo/hourly/2012_11_04_02/20121026-20121105_88d8z00mdw_2012_11_04_02_40_activities.json/02d81dbc0d69f06d130f998933abebbb.json", created: "Oct 22, 2013 04:07:18 PM UTC", modified: "Jul 24, 2013 07:24:06 PM UTC", publishedDate: "Nov 4, 2012 07:42:52 AM UTC", source: [ "Super Storm Sandy - Twitter: 2012_11_04_02" ], sourceKey: [ "modus.139.datasift.sandy_demo.hourly.2012_11_04_02.." ], mediaType: [ "Social" ], description: "This is my 1st day on Twitter since #Sandy. If it's not your's, you weren't really affected by #Sandy.", entities: [ { disambiguated_name: "brettshanley", index: "brettshanley/twitterhandle", actual_name: "Brett Shanley", type: "TwitterHandle", relevance: 0, frequency: 1, totalfrequency: 0, doccount: 1, dimension: "Who", linkdata: [ "http://www.twitter.com/brettshanley" ] }, { disambiguated_name: "New York, NY", index: "new york, ny/location", actual_name: "New York, NY", type: "Location", relevance: 0, frequency: 1, totalfrequency: 0, doccount: 236262, geotag: { lat: 40.7141667, lon: -74.0063889 }, dimension: "Where", ontology_type: "city" }, { disambiguated_name: "Sandy", index: "sandy/hashtag", actual_name: "Sandy", type: "HashTag", relevance: 0, frequency: 1, totalfrequency: 0, doccount: 254130, dimension: "What" }, { disambiguated_name: "sandy", index: "sandy/keyword", actual_name: "sandy", type: "Keyword", relevance: 0.751584, frequency: 1, totalfrequency: 0, doccount: 450855, dimension: "What", sentiment: 0.0176949 } ], tags: [ "gnip", "twitter" ], communityId: [ "50d251f5e4b001acdb3acfad" ], sourceUrl: "smb://modus:139/datasift/sandy_demo/hourly/2012_11_04_02/20121026-20121105_88d8z00mdw_2012_11_04_02_40_activities.json", associations: [ { entity1: "brettshanley", entity1_index: "brettshanley/twitterhandle", verb: "tweets_about", verb_category: "tweets_about", entity2: "sandy", entity2_index: "sandy/hashtag", assoc_type: "Event" }, { entity1: "brettshanley", entity1_index: "brettshanley/twitterhandle", verb: "tweets_about", verb_category: "tweets_about", entity2: "sandy", entity2_index: "sandy/hashtag", assoc_type: "Event" } ], metadata: { json: [ { id: "tag:search.twitter.com,2005:264920622201700352", objectType: "activity", actor: { objectType: "person", id: "id:twitter.com:85672934", link: "http://www.twitter.com/brettshanley", displayName: "Brett Shanley", postedTime: "2009-10-27T21:56:46.000Z", image: "http://a0.twimg.com/profile_images/2677733249/737e0e24acdea25223f2d17ea4af0c00_normal.jpeg", summary: "Writer, professor, godless commie pinko. I'm nicer than I sound. ", links: [ { href: "http://www.facebook.com/brett.shanley", rel: "me" } ], friendsCount: "384", followersCount: "218", listedCount: "5", statusesCount: "1193", twitterTimeZone: "Eastern Time (US & Canada)", verified: "false", utcOffset: "-18000", preferredUsername: "brettshanley", languages: [ "en" ], location: { objectType: "place", displayName: "New York, NY" } }, verb: "post", postedTime: "2012-11-04T02:42:52.000Z", generator: { displayName: "web", link: "http://twitter.com" }, provider: { objectType: "service", displayName: "Twitter", link: "http://www.twitter.com" }, link: "http://twitter.com/brettshanley/statuses/264920622201700352", body: "This is my 1st day on Twitter since #Sandy. If it's not your's, you weren't really affected by #Sandy.", object: { objectType: "note", id: "object:search.twitter.com,2005:264920622201700352", summary: "This is my 1st day on Twitter since #Sandy. If it's not your's, you weren't really affected by #Sandy.", link: "http://twitter.com/brettshanley/statuses/264920622201700352", postedTime: "2012-11-04T02:42:52.000Z" }, twitter_entities: { urls: [ ], hashtags: [ { text: "Sandy", indices: [ "36", "42" ] }, { text: "Sandy", indices: [ "95", "101" ] } ], user_mentions: [ ] }, retweetCount: "0", gnip: { language: { value: "en" }, matching_rules: [ { value: "(point_radius:[40.588437 -73.657908 25mi] OR bio_location_contains:", NY" OR bio_location_contains:", NYC" OR bio_location_contains:", NJ") (hurricane OR contains:sandy OR contains:disaster OR contains:storm)" }, { value: "(point_radius:[40.579532 -74.150201 25mi] OR bio_location_contains:", NY" OR bio_location_contains:", NYC" OR bio_location_contains:", NJ") (hurricane OR contains:sandy OR contains:disaster OR contains:storm)" }, { value: "(point_radius:[40.821489 -73.987639 25mi] OR bio_location_contains:", NY" OR bio_location_contains:", NYC" OR bio_location_contains:", NJ") (hurricane OR contains:sandy OR contains:disaster OR contains:storm)" } ], klout_score: "35" } } ] } } }
Editing the Source
You will likely want to edit the source to tailor the text extraction and feature extraction settings. You can edit the source using either the JSON editor or Source Builder.*
*Enterprise edition only.
For more information concerning text extraction and feature extraction, see section Source Pipeline Elements.
Using Source Builder to Edit the Source
Source Builder provides an intuitive user interface to perform editing of sources. You can use Source Builder to change the Text Extraction and Feature Extraction settings.
To edit the extraction settings
- From the Source editor, click on SRC UI. The Source Builder is displayed.
- Use "Source View" and "Form View" to change the enginename using the dropdown, as indicated in the screenshot below.
In this example, Automated Entities has been set to AlchemyAPI
For more information concerning text extraction and feature extraction, see section Source Pipeline Elements.
Re-testing
After you have made the necessary changes to the Source Configuration, re-test the source to verify the results.
Publishing the Source
Once you are satisfied with the results, you can publish the source.
- Ensure that you have saved the source since your last modifications.
- Click on Publish Source. The source is published and progress is available from Source Monitor.
This page does not cover Visualizations using the Visualization widgets. For more information, see section Visualization.
In this section:
Related Media:
IKANOW Blog Posts
Brand Management: Source Discovery with Infinit.e
Integrating Infinit.e With Pentaho
Useful Sites