...
Source management is intrinsically a complex process (particularly when taking advantage of Infinit.e's customization engine).
The Infinit.e.Manager Sources page provides a simple interface for adding and testing new sources, saving templates for future sources, and managing existing ones. Future iterations of the tool will provide actual support for the difficult bits of source writing, such as writing Javascript and regexes.
Note that the grey lines can be dragged to increase or decrease the size of the editor window.
The "Filter" text box will by default search the source titles, but it can also search the following fields:
- URL: type "url:<url fragment>"
- (note that URLs from the processing pipeline or feed configuration objects won't be searched unless you are currently editing them).
- Community IDs: type "community:<community-id>"
- ID: type "id:<source _id field>"
- Tags: type "tags:<tag fragment>"
- key, title, description, mediaType and extractType: use the same "fieldName:<field value fragment syntax>"
- (note title is the default if no prefix is specified)
- Suspended sources:
- "suspended:true" to see manually suspended tasks
- "fullQuarantined:true" to see unauthorized sources (this can happen automatically because they error too much, or if they are disabled by an administrator)
- "tempQuarantined:true" to see sources quarantined for the day (because of a possibly transient source error)
Create New Source
...
...
...
...
...
...
...
...
Edit Existing Sources
To edit an existing source click on the source's name in the list of Sources found on the left hand side of the page.
...
Info |
---|
If copying the logic of an existing source, it is recommended to first "scrub" it to remove any server-added fields (particularly "_id" and "key", which can overwrite the existing source). |
...
...
...
There are 3 tabs that can be edited:
- "JSON" - this is the full source including all fields
- New source pipeline:
- "JS" - The global script that all other elements can use - all of the logic can be written in here as separate functions, and then the scriptlets in other pipeline elements can be simple calls to these functions, to maximize the maintainability of the code in the source.
- "LS" - If generated Logstash sources, you can write the configuration directly into here
- "UI" (currently only supported in the enterprise build) - brings up the source builder GUI
- Legacy sources:
- "JS-U" - the Unstructured Analysis Module allows content to be transformed by "scriptlets" (xpath/regex/javascript) into document metadata. This view shows only the javascript maintained in "unstructuredAnalysis.script" - all of the logic can be written in here as separate functions, and then the scriptlets can be simple calls to these functions, to maximize the maintainability of the code in the source.
- "JS-S" - the Structured Analysis Module allows content to be transformed by "scriptlets" (xpath/regex/javascript) into document metadata. This view shows only the javascript maintained in "structuredAnalysis.script" - all of the logic can be written in here as separate functions, and then the scriptlets can be simple calls to these functions, to maximize the maintainability of the code in the source.
- "JS-RSS" - (only visible if the "searchConfig" field of "rss" is specified; use "Save Source" to reset visibility if it changes during editing) the Feed Harvester can use javascript (and xpath) to create multiple documents out of a single received feed. This view shows only the javascript maintained in "rss.searchConfig.globals" - all of the logic can be written in here as separate functions, and then the scriptlets can be simple calls to these functions, to maximize the maintainability of the code in the source.
...
...
- Go to the file uploader , filter on JSON type "source", select your source
- Share with a community in which your collaborator belongs (and is at least a "content publisher" if you want him to make changes)
- If you want to provide him with the ability to make changes, set the read access
- Warning - there is no automatic synchronization, so if you both make changes at the same time work can be lost
Validating the Source Format
...
If run on the "JS-U" or "JS-S" tabs then the javascript in "structuredAnalysis.script" or "unstructuredAnalysis.script" is checked instead.
...
...
...
...
...
...
...
...
...
...
...
...
As can be seen from the above screen capture, the pop up contains 2 text elements:
- A status message including the number of documents returned, any errors or warnings encountered etc.
- The JSON of the extracted and enriched /wiki/spaces/INF/pages/3899780, if the test was successful.
- Future versions of the tool will allow the documents to be viewed in widgets in the main GUI, providing a much easier interface to validate the source.
Based off the results from testing, the source can then be refined until the desired functionality is obtained.
...
...
...
...
...
...
...
...
...
...
Editing sources that have previously been approved may not require further moderation, if only display fields have been modified; otherwise it is suspended pending approval as above.
Note that once a source has been published, its status can be monitored from "<ROOT URL>/InfiniteSourceMonitor.html" (eg http://infinite.ikanow.com/InfiniteSourceMonitor.html), provided you are logged into the main GUI or source builder.
After publishing a share, you should get an alert saying that the source has been published and the working copy "share" has been deleted. If you don't get this alert, then it is likely that an internal configuration error has occurred - contact your system administrator to get it fixed.
...
...
...
...
"Scrubbing" sources
...
...
Enabling/disabling sources
...
...
...
...
...
...
...
...
...
...
...
Monitoring sources
There is a graphical utility to monitor sources available from the home page (Source Monitor link). It opens in a new tab and is pictured below. It is not possible to change any source information from this GUI.
A subset of this information can also be accessed from the Source Manager dialog of the main GUI.
The colors have the following meanings:
- Green: successfully harvested ("success")
- Blue: in progress ("in_progress")
- (or has partially harvested, "success_iteration" - means that the most recent harvest cycle completed but not all available documents were harvested because of document/cycle limitations)
- Red: harvested with errors ("error")
- Yellow: not yet seen by a harvester, or currently unapproved.
If the colored "Status" column contains numbers, eg "0/20" then it is referring to the (beta) distributed source function - the left number is the number of "in progress" threads, and the right number is the total number of threads.
...
Panel |
---|
Related Reference Documentation: |