Page Comparison

...

The HTML needs to be parsed - this is discussed below ("using xpath to parse HTML")
It will often be the case (eg for Intranet search engines) that multiple pages must be traversed (eg 10 results/page). The following sub-fields of "searchConfig" are intended to handle these cases:
- numPages: the total number of pages that will be checked each search cycle.
- pageChangeRegex: a regex that must have at least one capturing group and must match the entire part of the URL that controls the page number. See example below.
- pageChangeReplace: the above string that controls the page number, with $1 used to represent the page number.
- (slightly misnamed) numResultsPerPage: If the "page number" in the URL is actually a result offset and not a page offset, then this field should be the number of results per page (which is then multiplied by the page number to generate the "$1" string mentioned above). See example.

...

If a field called "_ONERROR_" is generated then if no links are returned from the first page (ie likely due to a formatting error) then the contents of _ONERROR_ (assumed to be a string) are dumped to the harvest message.
Only when running from the "Config - Source - Test" API call (including from the Source Editor GUI), then for every page, all of the _ONDEBUG_ field values (can be string or object) are dumped to the harvest message.

...

Versions Compared