Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
#-------------------------------------------------------------------------------
# 2.11] Harvester Properties
#-------------------------------------------------------------------------------
# Comma-separated-list from File,Database,Feed (note Database and Feed need jars not bundled with the RPM)
harvester.types=File,Database,Feed
# Web crawling etiquette: the time to way between consecutive accesses to the same time (10s is standard)
harvest.feed.wait=10000
# The minimum time between consecutive harvests (avoids thrashing FS/DB/RSS when there's nothing to get)
harvest.mintime.ms=300000
# The minimum time between consecutive source harvests (set if needs to be longer than harvest.mintime.ms,
# eg if you want to pick up a source quickly the first time but then not update so frequently)
harvest.source.mintime.ms=
# Restricts the number of docs that can be harvested per cycle for memory reasons:
harvest.maxdocs_persource=5000
# Threading configuration type:num_threads (type from above):
# (eg for RSS heavy increase the "feed", for DB heavy increase the "file" etc. Beyond 20 there is limited benefit). 
harvest.threads=file:5,database:5,feed:20
# This controls the batch size of sources picked up by a thread, this does not normally need to be changed (its default is shown)
# (It can be reduced in cases where a small number of very long-running sources need to be harvested).
#harvest.distribution.batch.harvest=20
# This disables entity and association aggregation. For almost all applications you will not want to set this.
#harvest.disable_aggregation=false
# This parameter uses the Java Security Manager to prevent scripts accessing local network services (at the expense of some performance)
# It can be turned off for uses of the platform where sources must be approved before being added (etc)
harvest.security=true
2.12 Hadoop Properties

The Hadoop config path is a local folder where Infinit.e stores map reduce jobs if Hadoop is used.

...