Documentation Progress Tracker

Page	Andrew Comments	Alex Comments	Status
File extractor		(As per db extractor comment), call out specifically how the url is constructed in the different cases: office type, the path of the file (ie file.url + path-relative-to-url) json/xml/csv: if xmlsourcename and xmlprimarykey are specified: xmlsourcename + object,get(xmlprimarykey) if not: path-of-file (as above) + <hash of object> + ".csv"/.json/.xml	REVIEW
Feed extractor			REVIEW
Web extractor			REVIEW
Database extractor	Updated now and ready for review	There's another missing field that has changed between legacy and pipeline - the database object now has a "url" field (that was previously in the source top level) ... if no value is specified for 'primaryKeyValue' (hmm this also seems to be missing from the documentation it is in the code here: https://bitbucket.org/ikanow/ikanow_infinit.e_community/src/d4d92a4131ffc9706417b70077aec548178bcf58/core/infinit.e.data_model/src/com/ikanow/infinit/e/data_model/store/config/source/SourceDatabaseConfigPojo.java?at=master) then the document URL is database.url + record.get(primaryKey) (if no 'primaryKey' is specified then a random string is used), otherwise it's primaryKeyValue + record.get(primaryKey). It would be good for the file and db extractors to call out how the URL is constructed actually. (Re authentication: Made minor update to correct error in legacy documentation, and to reflect v0.3 functionality change)	REVIEW
Follow Web links			REVIEW
Automated text extraction	Alex will you convert to JSON for the TODO?		REVIEW
Manual text transformation			REVIEW
Document metadata	Alex will you convert to JSON for the TODO?		REVIEW
Content metadata	requires new examples in source gallery for regex and xpath (see IN PROGRESS)		REVIEW
Manual entities			REVIEW
Manual association of entities			REVIEW
Document storage settings	Additional examples for onUpdateScript, and metadataFieldStorage would be beneficial.		REVIEW
Feature extraction			REVIEW
Aliasing	Not supported		ON HOLD
Harvest control settings	Require more examples for the following: duplicateExistingUrls maxDocs_global throttleDocs_perCycle maxDocs_perCycle distributionFactor		REVIEW
Search index settings	More examples in the source for searchIndex parameters would be beneficial.		REVIEW
Lookup tables	I tried to edit an existing example from the old source, as I could not find any new examples. Please verify the changes I made to the example source and scripts.		REVIEW
Javascript globals			REVIEW