Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
#-------------------------------------------------------------------------------
# 21.9] UI settings
#-------------------------------------------------------------------------------
# The passphrase for the SSL keystore (not needed unless HTTPS is being used)
ssl.passphrase=
# This is a regex that, if specified, will allow only access to REST commands matching
# the pattern - only applied to remote clients. Connections from localhost always have access 
# to everything
# Eg the commented out example will allow only login/keepalive and querying.
#remote.access.allow=^/api/(knowledge/document/query|auth/login|auth/keepalive)
remote.access.allow=
# This parameter does the opposite, allows everything except specified commands
remote.access.deny=

...

1.12 Hadoop properties
Code Block
TODO

...

The following properties are required to configure the use of Alchemy or Open Calais.

#-------------------------------------------------------------------------------

...


# 1.

...

12] 

...

Hadoop 

...

properties

...

#-------------------------------------------------------------------------------

...


# This limits the number of jobs that can be concurrently submitted to the Hadoop cluster
# by the custom processing engine (other jobs remain at pending until a slot is available)
# There is no default, 10 is recommended as a sensible value until the size of your cluster is known.
hadoop.max_concurrent=10
1.13 Entity Extractor Properties

The following properties are required to configure the use of AlchemyAPI, Open Calais, or boilerpipe.

Code Block
#-------------------------------------------------------------------------------
# 1.13] Entity Extractor Properties
#-------------------------------------------------------------------------------
# Alchemy and Open Calais Keys:
# (Obtain from alchemyapi.com or opencalais.com)

...


extractor.key.alchemyapi=

...


extractor.key.opencalais=

...


#----------------------------------------------

...


# Entity extraction type selection: opencalais or alchemyapi or none

...


# ("opencalais" has a much higher limit than "alchemyapi" (1000/day) so is recommended for free use

...


#  "alchemyapi" extracts sentiment, "opencalais" extracts entity associations Note this can be overridden per source)

...


extractor.entity.default=

...


# Text extraction type selection: boilerplate or alchemyapi or none

...


# ("alchemyapi" is much better, but has the limit discussed above. Note this can be overridden per source)

...


extractor.text.default=
1.14 Interface Related Properties

The ui.end.point.url property is used to tell the UI where to connect to the Infinit.e API.

Info
#-------------------------------------------------------------------------------
# 1.14] Interface Related Properties for the AppConstants.js file found in:
#       /mnt/opt/infinite-tomcat/interface-engine/webapps/ROOT/
#-------------------------------------------------------------------------------
# The REST end point of the server (or a DNS/AWS load balancer across multiple rest end points):
# (Will normally end "/api/") 
ui.end.point.url=http://MY_REST_ENDPOINT/api/

...

2. Properties that can normally be left at their default

...

Properties that are only modified if Infinit.e is deployed in SAAS mode (which is uncommon).

Code Block
#-------------------------------------------------------------------------------

...


# 2.2] Software as a service (SAAS) settings

...


#-------------------------------------------------------------------------------

...


# If true, allows admin requests that come from trusted sources to have admin privileges:

...

 
app.saas=false

...


# A list of trusted DNS/IP addresses (eg from CMS):

...


app.saas.trusted.dns=
2.3 Amazon Services Properties

The use.aws property is used to configure whether or not the platform is deployed on Amazon EC2.

Code Block
#-------------------------------------------------------------------------------

...


# 2.3] Amazon services properties

...


#-------------------------------------------------------------------------------

...


# Values: 0=false, 1=true

...


# If deployed on an EC2 cluster set this to 1:

...


use.aws=0
2.6 API Search Test

Default search test terms and expected results values used to monitor the Infinit.e service.

Code Block
#-------------------------------------------------------------------------------

...


# 2.6] API Search Test Terms and Expected Results

...


#-------------------------------------------------------------------------------

...


# List of terms formatted like: "*" "something" "something":

...


# (The continuous testing randomly selects one of these for querying the API)

...


api.search.test.terms="*"

...


# The expected results (max 100), if a different number comes back, the system alerts:

...


api.search.expected.results=0
2.7 Amazon AWS Settings

Property used by s3cmd to connect to Amazon to move files around.

Code Block
#-------------------------------------------------------------------------------

...


# 2.7] Amazon AWS Settings

...


#-------------------------------------------------------------------------------

...


# Used for s3cmd, see their web page for details:

...

 
s3.gpg.passphrase=
2.8 MongoDB Properties

MongoDB database configuration properties.

Code Block
#-------------------------------------------------------------------------------

...


# 2.8] MongoDB Properties

...


#-------------------------------------------------------------------------------

...


# (server/port should normally point to localhost:27017), where API nodes have a mongos

...


db.server=localhost

...


db.port= 27017

...


# db.sharded - 0 = false and 1 = true

...


db.sharded=0

...


# The max number of documents to store (eg 10M). Docs will be dropped in order of age.

...


# (Not currently supported):

...


db.capacity=10000000

...


# MongoDB config server or servers (must be 1 or 3 comma separated IPs), non-EC2/AWS installations only

...


db.config.servers=

...


db.replica.sets=

...


#----------------------------------------------

...


# db.cluster.subnet - used for non-EC2/AWS only installations to help mongodb configurations

...


# identify proper host ip addresses, e.g. 127.0.0.

...


db.cluster.subnet=

...


#----------------------------------------------

...


# The location from which to fetch the geo.bson dump used for feature.geo

...


# can start s3://, http:// or https://, else is assumed to be a file, eg

...


#db.geo_archive=s3://config.saas.infinite.ikanow.com/geo.bson.tar.gz

...


# Can always be retrieved here

...


db.geo_archive=http://www.ikanow.com/infinit.e-preinstall/geo.bson.tar.gz
2.9 Access controls
Code Block
#-------------------------------------------------------------------------------
# 2.9] UI inactivity timeout (in seconds)
#-------------------------------------------------------------------------------
# After this many seconds of inactivity, users are logged out from their Infinit.e session
access.timeout=1800
2.10 Elasticsearch Properties
Code Block
#-------------------------------------------------------------------------------

...


# 2.10] Elasticsearch Properties

...


#----------------------------------------------

...


# Discovery mode = ec2 (if running on AWS) or zen (specify a list of IPs below):

...


elastic.node.discovery=ec2

...


#----------------------------------------------

...


# ES nodes, e.g.: elastic.search.nodes='NODE1:9300','NODE2:9300','NODE3:9300':

...


# Needed if discovery.mode=zen (not EC2/AWS), a set of IPs to try (>= 1 must be running elasticsearch)

...


elastic.search.nodes=

...


#-------------------------------------------------------------------------------

...


# mlockall = should equal true except if running on a machine with < 4GB of RAM

...


bootstrap.mlockall=true

...


# (Should normally be localhost:9300, unless an API node is running with no index node)

...

 
elastic.url=localhost:9300
2.11 Harvester Properties
#-------------------------------------------------------------------------------
# 2.11] Harvester Properties
#-------------------------------------------------------------------------------
# Comma-separated-list from File,Database,Feed (note Database and Feed need jars not bundled with the RPM)
harvester.types=File,Database,Feed
# Web crawling etiquette: the time to way between consecutive accesses to the same time (10s is standard)
harvest.feed.wait=10000
# The minimum time between consecutive harvests (avoids thrashing FS/DB/RSS when there's nothing to get)
harvest.mintime.ms=300000
# Restricts the number of docs that can be harvested per cycle for memory reasons:
harvest.maxdocs_persource=5000
# Threading configuration type:num_threads (type from above):
# (eg for RSS heavy increase the "feed", for DB heavy increase the "file" etc. Beyond 20 there is limited benefit). 
harvest.threads=file:5,database:5,feed:20
2.12 Hadoop

...

Properties

The Hadoop config path is a local folder where Infinit.e stores map reduce jobs if Hadoop is used.

Code Block
#-------------------------------------------------------------------------------

...


# 2.12] Hadoop 

...

Properties

...

#-------------------------------------------------------------------------------

...


hadoop.configpath=/mnt/opt/hadoop-infinite/mapreduce/
2.13 Entity Extractor Properties
Code Block
 

 

#-------------------------------------------------------------------------------
# 2.13] Entity Extractor Properties
#-------------------------------------------------------------------------------
# Alchemy extraction level
# 1==people postproc, 2==geo postproc, 3==both
# (This uses some hard-coded heuristics to work around known AlchemyAPI errors)
app.alchemy.postproc=3

...