...
If it is used on web/feed sources, it can be called using "links." If used on file/database sources it can be called using "links" or "split."
Links
Follow Web Links can be used for API parsing (JSON, XML) or for more advanced HTML parsing.
...
Gliffy | ||||||
---|---|---|---|---|---|---|
|
Split
Split uses the same pipeline elements as defined in Format above, but is designed to work on file, or database sources.
...
Examples
API-Style Parsing
...
it is likely that standard web-crawling measures are needed such as custom user-agents, and per-page wait times. Because these might well be different from the search engine to the pages themselves, "Follow Web Links" has its own waitTimeBetweenPages_ms
, and userAgent
fields (if not specified these are inherited from the parent object).
...
Split
In the following example using "split", Follow Web Links has been configured to act on JSON/XML endpoints. Metadata is extracted from the endpoints, which is then used to generate new documents. deleteExisting
is set to True to delete the originals.
...