Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If it is used on web/feed sources, it can be called using "links."  If used on file/database sources it can be called using "links" or "split."

Follow Web Links can be used for API parsing (JSON, XML) or for more advanced HTML parsing.

...

Gliffy
chromemin
nameFollow Web Links 2

 

Split uses the same pipeline elements as defined in Format above, but is designed to work on file, or database sources.

...

Examples

...

it is likely that standard web-crawling measures are needed such as custom user-agents, and per-page wait times. Because these might well be different from the search engine to the pages themselves, "Follow Web Links" has its own waitTimeBetweenPages_ms, and userAgent fields (if not specified these are inherited from the parent object).

 

...

In the following example using "split", Follow Web Links has been configured to act on JSON/XML endpoints.  Metadata is extracted from the endpoints, which is then used to generate new documents.  deleteExisting is set to True to delete the originals.

...