...
...
...
...
...
...
- Structured Analysis Handler, phase 1: fill in unstructured document-level fields (title, description, full text) from metadata, if needed.
- Unstructured Analysis Handler, phase 1: use regexes and javascript to pull out new metadata fields from the unstructured document-level fields.
- Unstructured Analysis Handler, phase 2: use regex replaces to transform the source text, if needed.
- Unstructured Analysis Handler, phase 3: use regexes and javascript to pull out new metadata fields from the cleansed unstructured document-level fields.
- Standard extraction, phase 1 (text extraction): use a "text extractor" to create the text that is submitted to the entity extraction service in the next phase (if needed, often the entity extraction service will combine the 2 phases).
- Standard extraction, phase 2 (entity extraction): use an "entity extractor" (eg AlchemyAPI) to pull out entities and associations from the submitted text/URL.
- Structured Analysis Handler, phase 2: the remaining document-level field (URL, published data, document geo ... plus the title and description if these returned null before, ie in case the UAH has filled in required fields)
- Structured Analysis Handler, phase 3: create new entities from the metadata, combine entities from all phases into associations.
...
...
...
...
...
...
...
...