...
Code Block |
---|
#------------------------------------------------------------------------------- # 21.9] UI settings #------------------------------------------------------------------------------- # The passphrase for the SSL keystore (not needed unless HTTPS is being used) ssl.passphrase= # This is a regex that, if specified, will allow only access to REST commands matching # the pattern - only applied to remote clients. Connections from localhost always have access # to everything # Eg the commented out example will allow only login/keepalive and querying. #remote.access.allow=^/api/(knowledge/document/query|auth/login|auth/keepalive) remote.access.allow= # This parameter does the opposite, allows everything except specified commands remote.access.deny= |
...
1.12 Hadoop properties
Code Block |
---|
TODO |
...
The following properties are required to configure the use of Alchemy or Open Calais.
#------------------------------------------------------------------------------- |
...
# 1. |
...
12] |
...
Hadoop |
...
properties |
...
#------------------------------------------------------------------------------- |
...
# This limits the number of jobs that can be concurrently submitted to the Hadoop cluster
# by the custom processing engine (other jobs remain at pending until a slot is available)
# There is no default, 10 is recommended as a sensible value until the size of your cluster is known.
hadoop.max_concurrent=10 |
1.13 Entity Extractor Properties
The following properties are required to configure the use of AlchemyAPI, Open Calais, or boilerpipe.
Code Block |
---|
#------------------------------------------------------------------------------- # 1.13] Entity Extractor Properties #------------------------------------------------------------------------------- # Alchemy and Open Calais Keys: # (Obtain from alchemyapi.com or opencalais.com) |
...
extractor.key.alchemyapi= |
...
extractor.key.opencalais= |
...
#---------------------------------------------- |
...
# Entity extraction type selection: opencalais or alchemyapi or none |
...
# ("opencalais" has a much higher limit than "alchemyapi" (1000/day) so is recommended for free use |
...
# "alchemyapi" extracts sentiment, "opencalais" extracts entity associations Note this can be overridden per source) |
...
extractor.entity.default= |
...
# Text extraction type selection: boilerplate or alchemyapi or none |
...
# ("alchemyapi" is much better, but has the limit discussed above. Note this can be overridden per source) |
...
extractor.text.default= |
1.14 Interface Related Properties
The ui.end.point.url property is used to tell the UI where to connect to the Infinit.e API.
Info |
---|
#------------------------------------------------------------------------------- |
...
2. Properties that can normally be left at their default
...
Properties that are only modified if Infinit.e is deployed in SAAS mode (which is uncommon).
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.2] Software as a service (SAAS) settings |
...
#------------------------------------------------------------------------------- |
...
# If true, allows admin requests that come from trusted sources to have admin privileges: |
...
app.saas=false |
...
# A list of trusted DNS/IP addresses (eg from CMS): |
...
app.saas.trusted.dns= |
2.3 Amazon Services Properties
The use.aws property is used to configure whether or not the platform is deployed on Amazon EC2.
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.3] Amazon services properties |
...
#------------------------------------------------------------------------------- |
...
# Values: 0=false, 1=true |
...
# If deployed on an EC2 cluster set this to 1: |
...
use.aws=0 |
2.6 API Search Test
Default search test terms and expected results values used to monitor the Infinit.e service.
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.6] API Search Test Terms and Expected Results |
...
#------------------------------------------------------------------------------- |
...
# List of terms formatted like: "*" "something" "something": |
...
# (The continuous testing randomly selects one of these for querying the API) |
...
api.search.test.terms="*" |
...
# The expected results (max 100), if a different number comes back, the system alerts: |
...
api.search.expected.results=0 |
2.7 Amazon AWS Settings
Property used by s3cmd to connect to Amazon to move files around.
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.7] Amazon AWS Settings |
...
#------------------------------------------------------------------------------- |
...
# Used for s3cmd, see their web page for details: |
...
s3.gpg.passphrase= |
2.8 MongoDB Properties
MongoDB database configuration properties.
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.8] MongoDB Properties |
...
#------------------------------------------------------------------------------- |
...
# (server/port should normally point to localhost:27017), where API nodes have a mongos |
...
db.server=localhost |
...
db.port= 27017 |
...
# db.sharded - 0 = false and 1 = true |
...
db.sharded=0 |
...
# The max number of documents to store (eg 10M). Docs will be dropped in order of age. |
...
# (Not currently supported): |
...
db.capacity=10000000 |
...
# MongoDB config server or servers (must be 1 or 3 comma separated IPs), non-EC2/AWS installations only |
...
db.config.servers= |
...
db.replica.sets= |
...
#---------------------------------------------- |
...
# db.cluster.subnet - used for non-EC2/AWS only installations to help mongodb configurations |
...
# identify proper host ip addresses, e.g. 127.0.0. |
...
db.cluster.subnet= |
...
#---------------------------------------------- |
...
# The location from which to fetch the geo.bson dump used for feature.geo |
...
# can start s3://, http:// or https://, else is assumed to be a file, eg |
...
#db.geo_archive=s3://config.saas.infinite.ikanow.com/geo.bson.tar.gz |
...
# Can always be retrieved here |
...
db.geo_archive=http://www.ikanow.com/infinit.e-preinstall/geo.bson.tar.gz |
2.9 Access controls
Code Block |
---|
#------------------------------------------------------------------------------- # 2.9] UI inactivity timeout (in seconds) #------------------------------------------------------------------------------- # After this many seconds of inactivity, users are logged out from their Infinit.e session access.timeout=1800 |
2.10 Elasticsearch Properties
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.10] Elasticsearch Properties |
...
#---------------------------------------------- |
...
# Discovery mode = ec2 (if running on AWS) or zen (specify a list of IPs below): |
...
elastic.node.discovery=ec2 |
...
#---------------------------------------------- |
...
# ES nodes, e.g.: elastic.search.nodes='NODE1:9300','NODE2:9300','NODE3:9300': |
...
# Needed if discovery.mode=zen (not EC2/AWS), a set of IPs to try (>= 1 must be running elasticsearch) |
...
elastic.search.nodes= |
...
#------------------------------------------------------------------------------- |
...
# mlockall = should equal true except if running on a machine with < 4GB of RAM |
...
bootstrap.mlockall=true |
...
# (Should normally be localhost:9300, unless an API node is running with no index node) |
...
elastic.url=localhost:9300 |
2.11 Harvester Properties
#-------------------------------------------------------------------------------
# 2.11] Harvester Properties
#-------------------------------------------------------------------------------
# Comma-separated-list from File,Database,Feed (note Database and Feed need jars not bundled with the RPM)
harvester.types=File,Database,Feed
# Web crawling etiquette: the time to way between consecutive accesses to the same time (10s is standard)
harvest.feed.wait=10000
# The minimum time between consecutive harvests (avoids thrashing FS/DB/RSS when there's nothing to get)
harvest.mintime.ms=300000
# Restricts the number of docs that can be harvested per cycle for memory reasons:
harvest.maxdocs_persource=5000
# Threading configuration type:num_threads (type from above):
# (eg for RSS heavy increase the "feed", for DB heavy increase the "file" etc. Beyond 20 there is limited benefit).
harvest.threads=file:5,database:5,feed:20
2.12 Hadoop
...
Properties
The Hadoop config path is a local folder where Infinit.e stores map reduce jobs if Hadoop is used.
Code Block |
---|
#------------------------------------------------------------------------------- |
...
# 2.12] Hadoop |
...
Properties |
...
#------------------------------------------------------------------------------- |
...
hadoop.configpath=/mnt/opt/hadoop-infinite/mapreduce/ |
2.13 Entity Extractor Properties
Code Block |
---|
|
#-------------------------------------------------------------------------------
# 2.13] Entity Extractor Properties
#-------------------------------------------------------------------------------
# Alchemy extraction level
# 1==people postproc, 2==geo postproc, 3==both
# (This uses some hard-coded heuristics to work around known AlchemyAPI errors)
app.alchemy.postproc=3
...