Source Editor

Overview

The Source Editor allows you to perform many useful source management actions.  Using the Source Editor you can create, edit, validate, test and publish sources so that the data is ready for visualization.

The Source Editor GUI is not currently compatible with IE. It is compatible with Chrome, Firefox, and Safari.

About Sources and Publishing

If you publish a new source, or submit (publish) a source to a community you do not own, the source is initially added in a "pending" state. An email is sent to the community owners and moderators for approval.

Changes to display fields for previously approved sources do not require further approval.

Once a source has been published, you can monitor its status using the Source Monitor.

After publishing a share, you should get an alert saying that the source has been published and the working copy "share" has been deleted.  If you don't get this alert, then it is likely that an internal configuration error has occurred and you should contact your system administrator.

 


Using the Source Editor

Creating a New Source

You can use the Source Editor to create a new source.  There are several ways to create new sources using the Source Editor.

  • Using a template
  • Starting with a blank source

Creating a New Source From Template

To create a new source from template

  1. Click on the New Source button in the upper right hand corner of the page.  The Manager application will forward you to the "Create New Source" page.
  2. Choose a dropdown option from the "Create a New Source" box, and click on View template.
  3. Fill out the title/description/tags/community fields. 
  4. Click Create Source.

You will be able too modify the source on the next page before it starts running.

 

When copying an existing source into the New Source window, the existing source should be "scrubbed" first. Otherwise the presence of the "_id"/"key" fields will result in the old source being modified rather than a new one being created.

Creating a New Source From Blank Template

To create a new source from a blank template

  1. Click on the New Source button in the upper right hand corner of the page.  The "Create New Source" page is displayed.
  2. Fill out the title/description/tags/community fields
  3. Click Create Source.  You can build a source from scratch or paste an existing source on the next page.


Editing Existing Sources

When you edit sources you should be aware of the types of documents that appear in the sources list.  The following source document types are possible: 

  • Published sources
  • Editable copies of published sources
  • Shares that are not yet published.  Shares are denoted by "(*)".

Authorization Requirements:

"private" sources ("isPublic":"false") do not have all fields displayed unless you are an admin, community moderator, or the source owner. In this case, it is likely that testing them or using them as the basis for a new source will fail. Contact the source owner to get a full copy.

To edit an existing source

  1. Click on the source's name in the list of Sources found on the left hand side of the page.
  2. Edit the source using one of the applicable editor tabs. eg. JSON, JS, SRC UI.

For more information about the editor tabs, see section Source Editor Interface.  For detailed information about the Source Builder, see section Source Builder User Interface.

 


Sharing Sources

By default only you can see your temporary copies of sources (so for example you cannot share links to sources being edited). You can use the File Uploader to share sources in either read or read-write modes.

To share a source

  1. Navigate your browser to the File Uploader interface.
  2. For "Filter On" select JSON.
  3. Select the source of choice from the filtered results.
  4. Share with the community of choice.  The user that you intend to share the source with must be a member of the community, and must have "content publisher" permissions.

 

When co-authoring sources, there is no automatic synchronization. If two users make changes concurrently, work can be lost.

Validating the Source Format

You can use the source editor to check if the JSON is valid.

To check the Source JSON format is valid

  • Select the "Check Format" button (middle right).

The automatic validation does not run on the javascript, only on the JSON.

For more information, see section Source Editor Interface.

 


Testing a Source

Once a first draft of a source is complete it should be tested to see which documents it extracts and how it enriches the documents with additional metadata, entities, and associations, etc.

Two parameters can be set for testing the sources:

"Full text": By default, the full text of a document is not returned because it can be quite long.  However, for testing text extractors (eg "boilerpipe" vs "none" vs "AlchemyAPI"), or for testing metadata generation, it may be useful to enable "Full text" mode.

"Number of documents": the maximum number of documents that will be enriched and returned. The smaller the number of documents, the quicker the API call is returned.

To test a source

  1. Configure the test parameters as required.  
  2. Click on the Test Source button to start the testing process.  It can take a few minutes for the processed documents to be returned. Temporarily setting the "waitTimeOverride_ms" field of the "rss" object to be 1000 (ie 1s) can be useful during the debug stages.
  3. Review the content of the test results pop-up.

Based off the results from testing, the source can then be refined until the desired functionality is obtained.

The first time you test a source, you are likely to get an error accompanied by a request from the browser to allow/deny the window from launching pop ups.

Saving Sources as Templates

The Sources page allows you to save sources as templates to streamline the process creating new sources that share common attributes.

To save a source as a template

  • Click on the Save Source as Template button.  Your new template will be available in the Source Templates drop down on the "Create New Source" page.

Authorization Requirements:

The template is shared with the source's community - if you don't want to share with anybody else then set the dropdown to be your personal community before saving it as a template.

 


Publishing Sources

Sources need to be "published" to the system in order for the Community Edition Core Server to begin harvesting. Once you have created and tested a source, or edited and tested an existing source, you can publish the source.

To publish a source

  • Click on the Publish Source button. Provided that the source is valid it will be published.

 


Reverting Sources

The "revert" button in the top right hand corner of the code editor, for published sources, overwrites the existing temporary share with the current version of the source in the database. This can be useful for two reasons:

  • To discard unwanted manual changes 
  • To update the "harvest" status block, if there are no changes

To revert a source

  • Click on Revert from the code editor.  The current version of the source in the database overwrites the temporary share.


Scrubbing Sources

Scrubbing sources removes all fields added by the server after publishing, just retaining the actual ingest logic. It should be used before copying/templating.

To scrub the source

  • From the code editor, click on Scrub.  Any extraneous fields are removed.

 

If you accidentally scrub the source and then save it then you can get back to the original published source by just deleting the share and then re-selecting the source

Suspending Sources

You can suspend a source to remove the source and its documents from queries.

To suspend a source

  • From the source editor, click on Suspend Source.  The source will no longer be searched, and its documents will no longer be made available to queries from the visualization GUI.

 

Note that this button only affects the un-published version of the source (ie the corresponding share). The source should be published to apply the change - you are automatically prompted for this.

Deleting a Source's Documents

Deleting source docs. will leave the source intact but will delete all of the documents harvested so far. It can only be performed on sources you own unless you are a community moderator or an admin.

To delete a source's documents

  • From the Source Editor, click on Delete Docs.

 

Use with caution. Also for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

Deleting Sources or Shares

You can use the Source Editor to delete shares or sources.

To delete a share

  1. Click the delete button in the sources list window.
  2. When prompted, click on Ok.
  • If the share has been published the share is deleted but the published source is left alone and will appear in the Sources list.
  • If the share has not been published the share will simply be deleted and will disappear from the Sources list.

To delete a source

  1. Click the delete button in the sources list window
  2. When prompted, click on Ok.

 If you confirm the deletion the system will then delete the published source and all harvested documents associated with it.

 

Deleting a published source will also delete all documents associated with that source. In some cases those documents will not be retrievable (eg old URLs from an RSS feed). This should therefore be used with caution. Also, for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

In this section: