CSV Data
- andrew johnston (Unlicensed)
Ingest .CSV Data and Generate Entities
A common Manager task is to ingest a .CSV file and perform some basic source configuration tasks on it, in order to generate usable Entities.
Uploading the .CSV File
After you have located a .csv file which is appropriate for the platform, it can be uploaded using the File Uploader.
To upload the file
- Navigate to Manager>File Uploader.
- Fill in the fields of the File Uploader and Choose the file.
- Click on Submit.
Make sure you take note of the generated ID, which is displayed above after submission.
where id, is the alphanumeric character string displayed to the right of get/
Creating the New Source
Once the .csv file is uploaded you can create a new source.
To create the new source
- Navigate to Source Editor>New Source.
- Select template: Infinit.e ZIP Archives/JSON Share Example.
- Click on Select.
- Fill in the remaining information, and ensure you select the correct Community.
- Click on Save Source.
Editing the Source
You can edit the source using the JSON editor or the Source Builder.*
*Enterprise edition only
To edit the source using Source Builder
- From the newly created source, click on SRC UI. The Source Builder is displayed.
- Delete the elements in the Source View, except for the File Extractor.
- Paste the previously copied share ID into the url field of the Form View.
4. Scroll down and set type to "line-seperated."
5. Escape from the Source Builder.
6. Click on Save Source.
7. Click on Test Source.
The tested source is displayed in a new window.
At this point you should review the tested source to ensure it is as expected. A common problem that can occur at this stage is that badly formatted .csv file can make it difficult to properly identify the .csv headers.
Advanced Configuration for .CSV Files
The three key fields for the File extractor, when extracting .csv files are the following
- RootLevelValues
- TODO:definition
- IgnoreValues
- TODO definition
- AttributePrefix
- TODO definition
Common Advanced Configurations to Rectify Problems:
- Try pasting the problematic headers from the Source Test Output into RootLevelValues, and remove any problematic quotes or other characters
- Use ignore values to indicate any values that should be ignored. eg. #
Re-testing
After you have made your advanced configurations you can re-test.
A successful test result should show results for metadata, as indicated in the JSON example below.
{ "communityId": ["53add292e4b015f8f5817611"], "created": "Sep 23, 2014 10:46:27 PM UTC", "description": "\"St. Louis-area police\",\"Organization\",\"Who\",5.263157894736842,1.232778207145495,0.05263157894736842,1,1,0,0.0,0.0", "mediaType": ["Report"], "metadata": {"csv": [{ "doc_count": "1", "entity_dimension": "Who", "entity_name": "St. Louis-area police", "entity_type": "Organization", "query_avg_frequency": "0.05263157894736842", "query_coverage": "5.263157894736842", "query_significance": "1.232778207145495", "total_frequency": "1" }]}, "modified": "Sep 23, 2014 10:02:30 PM UTC", "publishedDate": "Sep 23, 2014 10:02:30 PM UTC", "source": ["CSV Example"], "sourceKey": ["inf...share.5421edf6e4b00c006cf54cd6.miscDescription."], "sourceUrl": "inf://share/5421edf6e4b00c006cf54cd6/miscDescription/csv file", "tags": ["csv"], "title": "csv file", "url": "inf://share/5421edf6e4b00c006cf54cd6/miscDescription/csv file/0fadc6615d7a5a77625881f8bb61092b.sv" } { "communityId": ["53add292e4b015f8f5817611"], "created": "Sep 23, 2014 10:46:27 PM UTC", "description": "\"Kenya\",\"Country\",\"Where\",5.263157894736842,1.188316079123999,0.05263157894736842,5,8,0,0.0,0.0", "mediaType": ["Report"], "metadata": {"csv": [{ "doc_count": "5", "entity_dimension": "Where", "entity_name": "Kenya", "entity_type": "Country", "query_avg_frequency": "0.05263157894736842", "query_coverage": "5.263157894736842", "query_significance": "1.188316079123999", "total_frequency": "8"
Entity Enrichment
Once the source is successfully returning metadata you can add either Manual Entities or Automated Entities, in order to enrich the .csv data
To add Entities
TODO
Publish the Source
TODO
Video Walkthrough
In this section:
Related User Documentation: