ZIP File Source

ZIP File

 A common Manager task is to process a generic file such as a ZIP file, and to configure the extractors for desired results.

Creating the ZIP File Source

To create a source for an uploaded ZIP File:

1. Navigate to the Manager  

2. Select ‘File Uploader’ from the menu

 

3. Enter Title for your ZIP File.

4. Enter Description

5. Select a Community

6. Select ‘Choose File’

7. Locate and Select zip file

8. A Success message will appear above the File Uploader.

9. Highlight and copy the share ID from the end of the URL.

10. Navigate back to the Source Manager home screen.

11. Select ‘Source Editor’

12. Select ‘New Source’ at the top right of the screen

13. Under the Source Templates dropdown on the left side: select "Infinit.e ZIP Archives/JSON Share Example" and click Select.

14. Under the ‘New Source’ template on the right side:

  • Enter a Title (i.e. PDF ZIP) and Description (i.e. PDF Zip file)

  • Enter desired Tags (separated by spaces, no commas)

  • Select a Community for your source (i.e. General News)

  • Paste the ‘Share ID’ from Step 9

  • Select ‘Save Source’ 


 


Testing the Source

Once you have provided the correct URL and saved the source you can test it to verify if documents are returned.  In this template, the default feature Engine is set to return both entities and associations.

To test the source

  1. Click on Test Source.  The platform will perform data processing and should then return the documents.  
  2. The "Source Test Output" window will open, displaying either a success or error message.

 

The example below shows a representative document returned.

{
    "communityId": ["4c927585d591d31d7b37097a"],
    "created": "Oct 14, 2014 10:11:16 PM UTC",
    "description": "",
    "mediaType": ["Report"],
    "metadata": {"_FILE_METADATA_": [{"metadata": {"Content-Type": ["application/octet-stream"]}}]},
    "modified": "Oct 14, 2014 06:19:54 PM UTC",
    "publishedDate": "Oct 14, 2014 06:19:54 PM UTC",
    "source": ["Iran Report 2"],
    "sourceKey": ["inf...share.543d6948e4b0d272bbe48c9c.miscDescription."],
    "tags": ["iran"],
    "title": "__MACOSX/._USIP_Template_5March2012-1.pdf",
    "url": "inf://share/543d6948e4b0d272bbe48c9c/miscDescription/__MACOSX/._USIP_Template_5March2012-1.pdf"
}
{
    "communityId": ["4c927585d591d31d7b37097a"],
    "created": "Oct 14, 2014 10:11:16 PM UTC",
    "description": "",
    "mediaType": ["Report"],
    "metadata": {"_FILE_METADATA_": [{"metadata": {"Content-Type": ["application/octet-stream"]}}]},
    "modified": "Oct 14, 2014 06:19:54 PM UTC",
    "publishedDate": "Oct 14, 2014 06:19:54 PM UTC",
    "source": ["Iran Report 2"],
    "sourceKey": ["inf...share.543d6948e4b0d272bbe48c9c.miscDescription."],
    "tags": ["iran"],
    "title": "__MACOSX/",
    "url": "inf://share/543d6948e4b0d272bbe48c9c/miscDescription/__MACOSX/"
}
{
    "associations": [
        {
            "assoc_type": "Summary",
            "entity1": "U.N. Security Council",
            "entity1_index": "u.n. security council/organization",
            "verb": "sanction",
            "verb_category": "generic relations"
        },
        {
            "assoc_type": "Summary",
            "entity1": "it",
            "entity2": "nuclear device",
            "entity2_index": "nuclear device/industryterm",
            "verb": "build",
            "verb_category": "generic relations"
        },
        {
            "assoc_type": "Summary",
            "entity2": "Qom",
            "entity2_index": "qom,qom province,iran/city",
            "geotag": {
                "lat": 34.6461111111,
                "lon": 50.8788888889
            },
            "verb": "ice storm",
            "verb_category": "natural disaster"
        },
        {
            "assoc_type": "Summary",
            "entity2": "Legal",
            "entity2_index": "legal/product",
            "verb": "known",
            "verb_category": "product recall"
        },
        {
            "assoc_type": "Summary",
            "entity1": "International Atomic Energy Agency",
            "entity1_index": "international atomic energy agency/organization",
            "verb": "report",
            "verb_category": "generic relations"
        },

 


Editing the Source

You will likely want to edit the source to tailor the text extraction and feature extraction settings.  You can edit the source using either the JSON editor or Source Builder.*

*Enterprise edition only.

For more information concerning text extraction and feature extraction, see section Source Pipeline Elements.

It is assumed you have obtained an OpenCalais or AlchemyAPI key and configured the properties file. 

Using Source Builder to Edit the Source

Source Builder provides an intuitive user interface to perform editing of sources.  You can use Source Builder to change the Text Extraction and Feature Extraction settings.

To edit the extraction settings

  1. From the Source editor, click on SRC UI.  The Source Builder is displayed.
  2. Use "Source View" and "Form View" to change the enginename using the dropdown, as indicated in the screenshot below.

In this example, Automated Text Extraction has been set to alchemyapi, and Automated Entities has been set to opencalais.

For more information concerning text extraction and feature extraction, see section Source Pipeline Elements.

TODO: screenshots from source builder

 


Publishing the Source

Once you are satisfied with the results, you can publish the source.

 

To publish the source

 

  1.  Ensure that you save the source since your last modifications.
  2. Click on Publish Source.  The source is published and progress is available from the Source Monitor Interface.

If a second test results in an error, double check all fields and test the ZIP File URL in a separate window to ensure it is accurate.

 


 

In this section:

Related Visualization Documentation:

Visualization

Visualization Widgets User Guide