Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

...

The Source Editor

...

allows you to perform many useful source management actions.  Using the manager you can create, edit, validate, test and publish sources so that the data is ready for visualization.

Source management is intrinsically a complex process (particularly when taking advantage of Infinit.e's customization engine). 

The Managing Sources with Infinit.e Source Manager Sources page provides a simple interface for adding and testing new sources, saving templates for future sources, and managing existing ones. Future iterations of the tool will provide actual support for the difficult bits of source writing, such as writing Javascript and regexes.

Image Removed

Note that the grey lines can be dragged to increase or decrease the size of the editor window.

The "Filter" text box will by default search the source titles, but it can also search the following fields:

  • URL: type "url:<url fragment>" 
    • (note that URLs from the processing pipeline or feed configuration objects won't be searched unless you are currently editing them).
  • Community IDs: type "community:<community-id>"
  • ID: type "id:<source _id field>"
  • Tags: type "tags:<tag fragment>"
  • key, title, description, mediaType and extractType: use the same "fieldName:<field value fragment syntax>"
    • (note title is the default if no prefix is specified)
  • Suspended sources:
    • "suspended:true" to see manually suspended tasks
    • "fullQuarantined:true" to see unauthorized sources (this can happen automatically because they error too much, or if they are disabled by an administrator)
    • "tempQuarantined:true" to see sources quarantined for the day (because of a possibly transient source error)

Using the Source Manager

Create New Source

To create a new source click on the New Source button in the upper right hand corner of the page. The Infinit.e.Manager application will forward you to the Create New Source page shown below:

...

Info

When copying an existing source into the New Source window, that existing source should be "scrubbed" first (middle right, "Scrub" button) - otherwise the presence of the "_id"/"key" fields will mean that the old source is modified instead of a new one being created.

 

Edit Existing Sources

To edit an existing source click on the source's name in the list of Sources found on the left hand side of the page.

...

Info

By default only you can see your temporary copies of sources (so for example you cannot share links to sources being edited). You can use the file uploader to share in either read or read-write:

  • Go to the file uploader , filter on JSON type "source", select your source
  • Share with a community in which your collaborator belongs (and is at least a "content publisher" if you want him to make changes)
  • If you want to provide him with the ability to make changes, set the read access
    • Warning - there is no automatic synchronization, so if you both make changes at the same time work can be lost

Validating the Source Format

To check the Source JSON format is valid at any time, select thte "Check Format" button (middle right).

...

This validation is run automatically before the source is saved, tested, enabled/disabled, or published. (Or when switching between the JSON/JS tabs). Note that the automatic validation does not run on the javascript, only on the JSON.

Testing a Source

Once a first draft of a source is complete it should be tested to see which documents it extracts and how it enriches the documents with additional metadata, entities, and associations, etc.

...

Based off the results from testing, the source can then be refined until the desired functionality is obtained.

Saving sources as templates

The Sources page allows you to save sources as templates to streamline the process creating new sources that share common attributes. To save a source as a template click on the Save Source as Template button. Note: Your new template will be available in the Source Templates drop down on the Create New Source page.

The template is shared with the source's community - if you don't want to share with anybody else then set the dropdown to be your personal community before saving it as a template.

Publishing sources

Sources need to be "published" to the system in order for the Infinit.e Core Server to begin harvesting. Once you have created and tested a source, or edited and tested an existing source, you can publish the source by clicking on the Publish Source button.

...

After publishing a share, you should get an alert saying that the source has been published and the working copy "share" has been deleted. If you don't get this alert, then it is likely that an internal configuration error has occurred - contact your system administrator to get it fixed.

"Reverting" sources

The "revert" button in the top right hand corner of the code editor, for published sources, overwrites the existing temporary share with the current version of the source in the database. This can be useful for 2 reasons:

  • To discard unwanted manual changes 
  • (If there are no changes) to update the "harvest" status block

"Scrubbing" sources

As discussed above in a few places, this removes all fields added by the server after publishing, just retaining the actual ingest logic. It should be used before copying/templating.

If you accidentally scrub the source and then save it then you can get back to the original published source by just deleting the share and then re-selecting the source.

Enabling/disabling sources

Sources can be disabled by setting their "searchCycle_secs" to a negative number. This button just automates that process.

Info

Note that this button only affects the un-published version of the source (ie the corresponding share). The source should be published to apply the change - you are automatically prompted for this.

Deleting source's documents

This button will leave the source intact but will delete all of the documents harvested so far. It can only be performed on sources you own unless you are a community moderator or an admin.

Info

Obviously, this function should be used with caution. Also for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

Deleting sources or shares

To delete a source or share click on the "X" button next to the source name in the Sources list:

...

Info

Note that deleting a published source will also delete all documents associated with that source. In some cases those documents will not be retrievable (eg old URLs from an RSS feed). This should therefore be used with caution. Also for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

Monitoring sources

There is a graphical utility to monitor sources available from the home page (Source Monitor link). It opens in a new tab and is pictured below. It is not possible to change any source information from this GUI.

...

Suspended sources retain their color status but have "[SUSPENDED]" prepended to their title.

Warning

The Source Editor GUI is not currently compatible with IE. It is compatible with Chrome, Firefox, and Safari.