Page Comparison

...

Page

Reviewed by Alex

Andrew Comments

Alex Comments

Status

File extractor

(As per db extractor comment), call out specifically how the url is constructed in the different cases:

office type, the path of the file (ie file.url + path-relative-to-url)
json/xml/csv: if xmlsourcename and xmlprimarykey are specified:
- xmlsourcename + object,get(xmlprimarykey)
if not:
- path-of-file (as above) + <hash of object> + ".csv"/.json/.xml

Status

colour	Yellow
title	Review

Feed extractor

Status

colour	Yellow
title	Review

Web extractor

Status

colour	Yellow
title	Review

Database extractor

Updated now and ready for review

There's another missing field that has changed between legacy and pipeline - the database object now has a "url" field (that was previously in the source top level) ... if no value is specified for 'primaryKeyValue' (hmm this also seems to be missing from the documentation it is in the code here: https://bitbucket.org/ikanow/ikanow_infinit.e_community/src/d4d92a4131ffc9706417b70077aec548178bcf58/core/infinit.e.data_model/src/com/ikanow/infinit/e/data_model/store/config/source/SourceDatabaseConfigPojo.java?at=master) then the document URL is database.url + record.get(primaryKey) (if no 'primaryKey' is specified then a random string is used), otherwise it's primaryKeyValue + record.get(primaryKey). It would be good for the file and db extractors to call out how the URL is constructed actually.

(Re authentication: Made minor update to correct error in legacy documentation, and to reflect v0.3 functionality change)

Status

colour	Yellow
title	Review

Follow Web links

Status

colour	Yellow
title	Review

Automated text extraction

Alex will you convert to JSON for the TODO?

Status

colour	Yellow
title	Review

Manual text transformation

Status

colour	Yellow
title	Review

Document metadata

Alex will you convert to JSON for the TODO?

Status

colour	Yellow
title	Review

Content metadata

requires new examples in source gallery for regex and xpath (see IN PROGRESS)

Status

colour	Yellow
title	Review

Manual entities

Status

colour	Yellow
title	Review

Manual association of entities

Status

colour	Yellow
title	Review

Document storage settings

Additional examples for onUpdateScript, and metadataFieldStorage would be beneficial.

Status

colour	Yellow
title	review

Feature extraction

Status

colour	Yellow
title	review

Aliasing

Not supported

Status

colour	Red
title	on hold

Harvest control settings

Require more examples for the following:

duplicateExistingUrls
maxDocs_global
throttleDocs_perCycle
maxDocs_perCycle
distributionFactor

Status

colour	Yellow
title	review

Search index settings

More examples in the source for searchIndex parameters would be beneficial.

Status

colour	Yellow
title	review

Lookup tables

I tried to edit an existing example from the old source, as I could not find any new examples. Please verify the changes I made to the example source and scripts.

Status

colour	Yellow
title	review

Javascript globals

Status

colour	Yellow
title	review

...

Versions Compared

Old Version 13

New Version 14

Key