Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Overview

Source management is intrinsically a complex process (particularly when taking advantage of Infinit.e's customization engine). 

The Infinit.e.Manager Sources page provides a simple interface for adding and testing new sources, saving templates for future sources, and managing existing ones. Future iterations of the tool will provide actual support for the difficult bits of source writing, such as writing Javascript and regexes.

Create New Source

To create a new source click on the New Source button in the upper right hand corner of the page. The Infinit.e.Manager application will forward you to the Create New Source page shown below:

Testing a Source

Once a first draft of a source is complete it should be tested to see which documents it extracts and how it enriches the documents with additional metadata, entities, and associations, etc.

Two parameters can be set for testing the sources:

  • "Full text": by default, the full text of a document is not returned (it can be quite long). For testing text extractors (eg "boilerpipe" vs "none" vs "AlchemyAPI"), or for testing "unstructured analysis" transformations, the text maybe useful or essential though; in these cases, enable this check box.
  • "Number of documents": the maximum number of documents that will be enriched and returned. The smaller the number of documents, the quick the API calls returns.

Click on the Test Source button to start the testing process. Note that it can take a few minutes for the processed documents to be returned. Temporarily setting the "waitTimeOverride_ms" field of the "rss" object to be 1000 (ie 1s) can be useful during the debug stages.

Note that the first time you test a source, you are likely to get an error accompanied by a request from the browser to allow/deny the window from launching pop ups. Select "Allow always" or the equivalent, refresh the browser if necessary, and press the test button again.

As can be seen from the above screen capture, the pop up contains 2 text elements:

  • A status message including the number of documents returned, any errors or warnings encountered etc.
  • The JSON of the extracted and enriched /wiki/spaces/INF/pages/4358642, if the test was successful.
    • Future versions of the tool will allow the documents to be viewed in widgets in the main GUI, providing a much easier interface to validate the source.

Based off the results from testing, the source can then be refined until the desired functionality is obtained.

Edit Existing Source

 

 

 

Deleting Sources or Shares

To delete a source or share click on the "X" button next to the source name in the Sources list:

  • Share: If the item you are deleting is a Share the system will ask you to confirm: "Do you really wish to delete the share: XXXXXXXXX (*)?". What happens next depends on whether or not the share has been published or not:
    • If the share has been published the share is deleted but the published source is left alone and will appear in the Sources list.
    • If the share has not been published the share will simply be deleted and will disappear from the Sources list.
       
  • Source: If the item you are deleting is a Source the system will ask you to confirm: "Do you really wish to delete the source: XXXXXXXXX?". If you confirm the deletion the system will then delete the published source and all harvested documents associated with it.

Note that deleting a published source will also delete all documents associated with that source. In some cases those documents will not be retrievable (eg old URLs from an RSS feed). This should therefore be used with caution. Also for sources with many documents, this operation may take some time (eg 10 minutes for 500,000 documents).

 

 

  • No labels