Source gallery
Note that the source format is being modified, so this gallery is not very active - one the new format is finalized, it will be converted to a set of examples in the new format
By category
Harvest Types
- Feed
- RSS: Feed Source
- HTML: Log File Source Gallery, Web-hosted XML
- Following links: Log File Source Gallery, Web-hosted XML
- File
- "office": Enron sample
- Line-separared: Log File Source Gallery
- XML: WITS sample
- JSON: GNIP sample
- Database
- SQL: DC crime sample
Search and update cycles
- Search cycles
- Update cycles: DC crime sample
Generating metadata
- Using regex
- Using javascript: GNIP sample, Log File Source Gallery, Enron sample
- Global functions
- Accessing external content
- Using xpath: Web-hosted XML (web following context)
- Metadata pipelines
Generating entities and associatons using NLP
- Text cleansing: Log File Source Gallery, Enron sample
- Specifying the text extraction engine: Feed Source
- Specifying the entity extraction engine: Enron sample, Feed Source
Generating entities from metadata
- With strings/replacement: GNIP sample, WITS sample, DC crime sample, Log File Source Gallery, Enron sample
- With javascript: GNIP sample, WITS sample, Log File Source Gallery, Enron sample
- Global functions: WITS sample
- From metadata arrays: GNIP sample
Generating associations from metadata
- With strings/replacement: GNIP sample, Log File Source Gallery, Enron sample
- With javascript: GNIP sample, Log File Source Gallery, Enron sample
- Global functions: WITS sample
- From metadata arrays: GNIP sample, Log File Source Gallery, Enron sample
Generating associations from entities
- With strings: DC crime sample, Log File Source Gallery, Enron sample
- With javascript: WITS sample, Log File Source Gallery, Enron sample
Retaining and discarding metadata for storage and/or indexing
- Retaining/discarding metadata for storage
- Retaining/discarding entities, associations, metadata for indexing: GNIP sample
By source
GNIP
Source, example documents and output
Categories:
- File (JSON)
- Unstructured Analysis
- Javascript
- Structured Analysis
- Entities
- Associations
- Entities and associations from arrays
- Javascript
WITS
Source, example documents and output
Categories:
- File (XML)
- Structured Analysis
- Entities
- Associations
- Entities from arrays
- Javascript
DC crime data
Source, example documents and output
Categories:
- Database (mysql)
- Structured Analysis
- Entities
- Associations
- Entities from arrays
- Javascript
Enron data
Source, example documents and output
Categories:
- File ("office")
- Entities from NLP
- Unstructured Analysis
- regex
- Structured Analysis
- Entities
- Associations
- Entities from arrays
- Javascript
Web-hosted XML
Source, example documents and output
Categories:
- Feed (Web)
- Following Links
- xpath
- javascript
- Unstructured Analysis
- javascript
Log file data
Source, example documents and output
Categories:
- Feed (Web), File (line-separated)
- Following Links
- xpath
- javascript
- Unstructured Analysis
- javascript
- Structured Analysis
- Entities
- Associations
- Entities from arrays
- Javascript
Feed Source
Source, example documents and output
Categories:
- Feed (RSS)
- Entities from NLP