...
- Feed
- RSS
- HTML
- Following Links
- File
- "office"
- XML
- JSON
- Database
- SQL
Search and update cycles
...
- Search cycles
- Update cycles
Generating metadata
- Using regex
- Using javascript
- Global functions
- Accessing external content
- Using xpath
- Metadata pipelines
...
- Retaining/discarding metadata for storage
- Retaining/discarding entities, associations, metadata
...
By source
GNIP (twitter)
Source, example documents and output
...