Config - Source - Test

/config/source/test?numReturn=<documents-to-return>&returnFullText=<1|0|true|false> (POST)

Returns documents harvested and enriched according to the POSTed source object. This call can be used to test and debug source configuration prior to saving the object to be harvested (at which point it becomes harder both to debug and to fix problems, eg by deleting documents).

NOTE: the user's personal community is added to the source's community list when a source is tested. This is used by downstream processes to distinguish between a source being tested and a source being harvested (since sources are not permitted to be run against a personal community otherwise).

Authentication

Required, see Auth - Login

Arguments

numReturn (optional)
Number of processed documents to return (defaults to 10). (Note that this does not affect how many documents are harvested, just how many are enriched. Therefore it may still take a while when run on large directories/databases/fileshares with slow IO.)

returnFullText (required)
If "false" (default) or 0, does not return the full text of the document (to make the document easier to read etc). If "true" or "1" returns the populated "fullText" field (which is sometimes necessary, eg to debug the text extraction or cleansing configuration).

Examples

Method.Post

Example using curl:

curl -XPOST 'http://infinite.ikanow.com/api/config/source/test?numReturn=1' -d '{ "json": {...} }'

Actionscript

See this example.

Example Response
{
	response: {
		action: "Test Source"
		success: true
		message: "successfully returned 4 docs: source=<url> extracted=4 updated=0 deleted=0 urlerrors=0. <Warnings/non-fatal errors>"
		time: 10
	}
}
Error Response
{
	response: {
		action: "Test Source"
		success: true
		message: "successfully returned 0 docs: 4 file error(s).\n\n<List of errors>"
		time: 10
	}
}
{
	response: {
		action: "Test Source"
		success: false
		message: "Source error: <error message>"
		time: 10
	}
}
Common error messages:
  • Unable to serialize Source JSON: indicates that the JSON object POSTED (or passed via URL parameter) is invalid. Try using JSON Lint or a similar tool to debug.
  • Source error: A major problem occurred before enrichment started, eg authentication or path problems.
  • "successfully returned 0 docs <...>": (0, or < the number desired) Also normally indicates an error occurred during harvesting but on a per document basis rather than for the entire source.
  • "[...] urlerrors=<more than 0>": These errors have occurred during the enrichment process, eg the third party text/entity extractor failed, or there were errors in the structured/unstructured analysis handlers. The lines following this notification are error messages that should give some idea what went wrong. 
    • (Note that errors can appear even if "urlerrors=0", eg non-fatal problems that caused an entity/association to be dropped).
  • (curl/wget returns nothing: Normally an indication that a POST has been used with no URL parameter, or a POST has been used with no content. Alternatively if using a command-line json lint tool, it may indicate that there are characters not handled by the tool or it believes the JSON is invalid.)