Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PageReviewed by AlexAndrew CommentsAlex CommentsStatus
File extractor
  •   
 

(As per db extractor comment), call out specifically how the url is constructed in the different cases:

  • office type, the path of the file (ie file.url + path-relative-to-url)
  • json/xml/csv: if xmlsourcename and xmlprimarykey are specified:
    • xmlsourcename + object,get(xmlprimarykey)
  • if not:
    • path-of-file (as above) + <hash of object> + ".csv"/.json/.xml
Status
colourYellow
titleReview
Feed extractor
  •   
  
Status
colourYellow
titleReview
Web extractor
  •   
  
Status
colourYellow
titleReview
Database extractor
  •   
Updated now and ready for review

There's another missing field that has changed between legacy and pipeline - the database object now has a "url" field (that was previously in the source top level) ... if no value is specified for 'primaryKeyValue' (hmm this also seems to be missing from the documentation it is in the code here: https://bitbucket.org/ikanow/ikanow_infinit.e_community/src/d4d92a4131ffc9706417b70077aec548178bcf58/core/infinit.e.data_model/src/com/ikanow/infinit/e/data_model/store/config/source/SourceDatabaseConfigPojo.java?at=master) then the document URL is database.url + record.get(primaryKey) (if no 'primaryKey' is specified then a random string is used), otherwise it's primaryKeyValue + record.get(primaryKey). It would be good for the file and db extractors to call out how the URL is constructed actually.

(Re authentication: Made minor update to correct error in legacy documentation, and to reflect v0.3 functionality change)

Status
colourYellow
titleReview
Follow Web links
  •   
  
Status
colourYellow
titleReview
Automated text extraction
  •   
Alex will you convert to JSON for the TODO? 
Status
colourYellow
titleReview
Manual text transformation
  •   
  
Status
colourYellow
titleReview
Document metadata
  •   
Alex will you convert to JSON for the TODO? 
Status
colourYellow
titleReview
Content metadata
  •   
requires new examples in source gallery for regex and xpath (see IN PROGRESS) 
Status
colourYellow
titleReview
Manual entities
  •   
  
Status
colourYellow
titleReview
Manual association of entities
  •   
  
Status
colourYellow
titleReview
Document storage settings
  •   
Additional examples for onUpdateScript, and metadataFieldStorage would be beneficial. 
Status
colourYellow
titlereview
Feature extraction
  •   
  
Status
colourYellow
titlereview
Aliasing
  •   
Not supported 
Status
colourRed
titleon hold
Harvest control settings
  •   

Require more examples for the following:

  • duplicateExistingUrls
  • maxDocs_global
  • throttleDocs_perCycle
  • maxDocs_perCycle
  • distributionFactor
 
Status
colourYellow
titlereview
Search index settings More examples in the source for searchIndex parameters would be beneficial. 
Status
colourYellow
titlereview
Lookup tables I tried to edit an existing example from the old source, as I could not find any new examples.  Please verify the changes I made to the example source and scripts. 
Status
colourYellow
titlereview
Javascript globals   
Status
colourYellow
titlereview

...