Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Infinit.e.Manager Sources page provides a simple interface for adding and testing new sources, saving templates for future sources, and managing existing ones. Future iterations of the tool will provide actual support for the difficult bits of source writing, such as writing Javascript and regexes.

Note that the grey lines can be dragged to increase or decrease the size of the editor window.

Create New Source

To create a new source click on the New Source button in the upper right hand corner of the page. The Infinit.e.Manager application will forward you to the Create New Source page shown below:

Info

When copying an existing source into the New Source window, that existing source should be "scrubbed" first (middle right, "Scrub" button) - otherwise the presence of the "_id"/"key" fields will mean that the old source is modified instead of a new one being created.

Edit Existing Sources

To edit an existing source click on the source's name in the list of Sources found on the left hand side of the page.

Note: There are three types of documents listed in the Sources list: published sources, shares that are editable copies of published sources, and shares that have not yet been published as sources. Shares are denoted by "(*)".

Info

If copying the logic of an existing source, it is recommended to first "scrub" it to remove any server-added fields (particularly "_id" and "key", which can overwrite the existing source).

Info

Note that "private" sources ("isPublic":"false") do not have all fields displayed unless you are an admin, community moderator, or the source owner. In this case, it is likely that testing them (or using them as the basis for a new source) will fail. Contact the source owner to get a full copy.

There are 3 tabs that can be edited:

  • "JSON" - this is the full source including all fields
  • "JS-U" - the Unstructured Analysis Module allows content to be transformed by "scriptlets" (xpath/regex/javascript) into document metadata. This view shows only the javascript maintained in "unstructuredAnalysis.script" - all of the logic can be written in here as separate functions, and then the scriptlets can be simple calls to these functions, to maxmize the maintainability of the code in the source.
  • "JS-S" - the Structured Analysis Module allows content to be transformed by "scriptlets" (xpath/regex/javascript) into document metadata. This view shows only the javascript maintained in "structuredAnalysis.script" - all of the logic can be written in here as separate functions, and then the scriptlets can be simple calls to these functions, to maxmize the maintainability of the code in the source.

Validating the Source Format

To check the Source JSON format is valid at any time, select thte "Check Format" button (middle right).

If run on the "JS-U" or "JS-S" tabs then the javascript in "structuredAnalysis.script" or "unstructuredAnalysis.script" is checked instead. 

This validation is run automatically before the source is saved, tested, enabled/disabled, or published. (Or when switching between the JSON/JS tabs). Note that the automatic validation does not run on the javascript, only on the JSON.

Testing a Source

Once a first draft of a source is complete it should be tested to see which documents it extracts and how it enriches the documents with additional metadata, entities, and associations, etc.

...

Based off the results from testing, the source can then be refined until the desired functionality is obtained.

Edit Existing Sources

To edit an existing source click on the source's name in the list of Sources found on the left hand side of the page.

Note: There are three types of documents listed in the Sources list: published sources, shares that are editable copies of published sources, and shares that have not yet been published as sources. Shares are denoted by "(*)".

Saving sources as templates

The Sources page allows you to save sources as templates to streamline the process creating new sources that share common attributes. To save a source as a template click on the Save Source as Template button. Note: Your new template will be available in the Source Templates drop down on the Create New Source page.

Note that templates are saved into your personal community only, but you can see any templates shared across any of the communities to which you belong. To share a template you have created with one of your communities, use the file uploader.

Info

Before turning a source into a template, that existing source should be "scrubbed" first (middle right, "Scrub" button) - otherwise the presence of the "_id"/"key" fields will mean that the old source is modified instead of a new one being created.

Publishing sources

Sources need to be "published" to the system in order for the Infinit.e Core Server to begin harvesting. Once you have created and tested a source, or edited and tested an existing source, you can publish the source by clicking on the Publish Source button.

...

Note that once a source has been published, its status can be monitored from "<ROOT URL>/InfiniteSourceMonitor.html" (eg http://infinite.ikanow.com/InfiniteSourceMonitor.html), provided you are logged into the main GUI or source builder.

"Scrubbing" sources

...

As discussed above in a few places, this removes all fields added by the server after publishing, just retaining the actual ingest logic. It should be used before copying/templating.

If you accidentally scrub the source and then save it then you can get back to the original published source by just deleting the share and then re-selecting the source.

Enabling/disabling sources

Sources can be disabled by setting their "searchCycle_secs" to a negative number. This button just automates that process.

Info

Note that this button only affects the un-published version of the source (ie the corresponding share). The source should be published to apply the change.

Deleting source's documents

This button will leave the source intact but will delete all of the documents harvested so far. It can only be performed on sources you own unless you are a community moderator or an admin.

Info

Obviously, this function should be used with caution. Also for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

Deleting sources or shares

To delete a source or share click on the "X" button next to the source name in the Sources list:

...

Info

Note that deleting a published source will also delete all documents associated with that source. In some cases those documents will not be retrievable (eg old URLs from an RSS feed). This should therefore be used with caution. Also for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

 

...

Monitoring sources

There is a graphical utility to monitor sources available from the home page (Source Monitor link). It opens in a new tab and is pictured below. It is not possible to change any source information from this GUI.

Image Added