Appendix B - infinite.configuration.properties.TEMPLATE (STANDALONE and AWS)

Introduction

The infinite.configuration.properties.TEMPLATE file is used to store temporary configuration properties that Infinit.e application installers use to get the platform up and running. The properties below should be set before proceeding with installing the Infinit.e platform on a server.

Note: This document covers both STANDALONE and AWS versions of the file (the 2 links point to the templates in github). "STANDALONE" is used whenever the platform isn't deployed to the Amazon cloud (in which case "AWS" applies). "STANDALONE" can apply to both non-internet connected and internet connected instances.

Click here for a very basic example that we use for our simplest AWS Marketplace servers.

Whenever you change any of the parameters in the "infinite.configuration.properties" file, you need to run "/opt/infinite-home/scripts/rewrite_property_files.sh" to propagate them to the applications.

1. Parameters that should (almost) always best set

Note that in the actual templates parameters that don't apply/need to be changed for a particular deployment type ("AWS" vs "STANDALONE") are moved into the "Properties that can normally be left at their default"

Important Note the text extractor is incorrectly set to "boilerplate" in section 1.13 of the configuration file in the August 2012 release of Infinit.e. The value should read "boilerpipe".

1.1 Basic Infinit.e Settings

Change the admin.password and test.user.password values from the default of infinit.e!2012.

#-------------------------------------------------------------------------------
# 1.1] Basic Infinit.e Settings 
#-------------------------------------------------------------------------------
# Default admin and test user passwords
# CHANGE THIS:
admin.email=infinite_default@ikanow.com
admin.password=infinit.e!2012
# Test User : This is a dummy account for testing only so it doesn't matter what it is set to (default below)
# (or it can be changed as below, but cannot be the same email as the admin)
#test.email=test_user@ikanow.com
# CHANGE THIS:
test.user.password=infinit.e!2012

Note: These password values are encrypted when the admin and test user accounts are created in the Infinit.e database.

1.2 Software as a service (SAAS) settings

1.3 Amazon Services Properties

"use.aws" should always be "true" for "AWS" templates and "false" for "STANDALONE" templates

The s3.url property is required for backups when Infinit.e is hosted on Amazon.

#-------------------------------------------------------------------------------
# 1.3] Amazon services properties
#-------------------------------------------------------------------------------
# This is the root s3 bucket name to be used for backups (use.aws=1 only):
# (The following names are used: mongo.<s3.url>, elasticsearch.<s3.url> .. daily backups in the same region
#  backup.mongo.<s3.url>, backup.elasticsearch.<s3.url> ... monthly backups in a different region
#  Note these dirs need to be set up manually)
s3.url=
#-------------------------------------------------------------------------------
# 2.3] Amazon services properties
#-------------------------------------------------------------------------------
# Values: 0=false, 1=true
# If deployed on an EC2 cluster set this to 1:
use.aws=0

1.4 EMail Server Settings

The following properties need to be filled out in order for Infinit.e to be able to send messages via email (system errors, communications to/from users, etc).

#-------------------------------------------------------------------------------
# 1.4] EMail Server Settings
#-------------------------------------------------------------------------------
# The server to be used for mail transactions (eg smtp.google.com if Internet-enabled, contact your sysadmin if not):
mail.server=
# Base-64 encoded Jasypt encrpyion of username (using salt/"password" "infinit.e"):
mail.username=
# Base-64 encoded Jasypt encrpyion of password (using salt/"password" "infinit.e"):
mail.password=
# This URL is used as the base for links included in the 
# So should point to an accessible REST endpoint (eg the same as ui.end.point.url below)
# If this is left commented out, it defaults to the browser domain (ie location of GUI)
#url.root=http://MY_REST_ENDPOINT/api/

1.5 Email Addresses for Log Files

Addresses to send log files from and to.

#-------------------------------------------------------------------------------
# 1.5] EMail Addresses for log files etc.
#-------------------------------------------------------------------------------
# All emails come from this user:
log.files.mail.from=
# System alert emails come from this user:
log.files.mail.to=

1.6 API Search Test Terms and Expected Results

1.7 Amazon AWS Settings

AWS access and secret keys required for the Infinit.e platform to access AWS.

#-------------------------------------------------------------------------------

# 1.7] Amazon AWS Settings
#-------------------------------------------------------------------------------
# AWS keys (only needed if use.aws=1)
aws.access.key=
aws.secret.key=

1.8 MongoDB Properties

MongoDB configuration properties that need to be set on any non-EC2/AWS installation.

#-------------------------------------------------------------------------------
# 1.8] MongoDB Properties
#-------------------------------------------------------------------------------
# MongoDB config server or servers (must be 1 or 3 comma separated IPs/hosts), non-EC2/AWS installations only
# (Can be used in AWS mode for legacy clusters only, to hardwire server order)
db.config.servers=
# Replica set configuration (non-EC2/AWS installations only): comma-separated list of either all IPs or all hostnames within a replica set, semi-colon-separated for multiple replica sets
# (eg "A,B;C,D" where A-D are IP addresses or hostnames would create 2 shards with 2 hosts each "A,B;A,B" would create 2 shards both on the same replica set, etc)
db.replica.sets=
#----------------------------------------------
# db.cluster.subnet - used for non-EC2/AWS only installations to help mongodb configurations
# identify proper host ip addresses, e.g. 127.0.0. - these are compared against the IPs in db.replica.sets
# Can also be one of the following: if blank then assumes that the db.replica.sets will be a list of short hostnames - if "FQDN" then assumes that the db.replica.sets will be a list of fully qualified hostnames
db.cluster.subnet=

1.9 UI settings

API access configuration.

#-------------------------------------------------------------------------------
# 1.9] UI settings
#-------------------------------------------------------------------------------
# The passphrase for the SSL keystore (not needed unless HTTPS is being used)
ssl.passphrase=

1.10 Elasticsearch Properties

The elastic.cluster property is required by all installations. The elastic.search.nodes property is only used in non-EC2/AWS installations.

#-------------------------------------------------------------------------------
# 1.10] Elasticsearch Properties
#-------------------------------------------------------------------------------
# Cluster name 
# Any unique name within the EC2 cluster/subnet: 
elastic.cluster=
#----------------------------------------------
# ES nodes, e.g.: elastic.search.nodes=['NODE1:9300','NODE2:9300','NODE3:9300']
# Needed if discovery.mode=zen (not EC2/AWS), a set of IPs to try (>= 1 must be running elasticsearch)
elastic.search.nodes=
# If any node sees less than this number of connections it will take itself down
# For a single node, should be 0 (default), for a 2-node system, should be 1, for large clusters
# ideally would be CLUSTER_SIZE - #REPLICAS (and >=2), but 2 is workable if the size is not fixed
# (if this is set too low then split brain situations may not be detected)
elastic.search.min_peers=0

1.11 Harvester properties

1.12 Hadoop properties

#-------------------------------------------------------------------------------
# 1.12] Hadoop properties
#-------------------------------------------------------------------------------
# This limits the number of jobs that can be concurrently submitted to the Hadoop cluster
# by the custom processing engine (other jobs remain at pending until a slot is available)
# There is no default, 10 is recommended as a sensible value until the size of your cluster is known.
hadoop.max_concurrent=10
# If this is set to true then custom (Hadoop) plugins will run locally on this node, without 
# a Hadoop cluster needing to be set up (which is complicated to do) - the downside of course is that
# jobs will not scale across many nodes. Set this to true for single-node clusters and before Hadoop
# has been installed.
hadoop.local_mode=false

1.13 Entity Extractor Properties

The following properties are required to configure the use of AlchemyAPI, Open Calais, TextRank, boilerpipe, tika, or other (custom) extractors.

#-------------------------------------------------------------------------------
# 1.13] Entity Extractor Properties
#-------------------------------------------------------------------------------
# Alchemy and Open Calais Keys:
# (Obtain from alchemyapi.com or opencalais.com)
# (can't be blank so set to DUMMY if not in use)
extractor.key.alchemyapi=DUMMY
extractor.key.opencalais=DUMMY
#----------------------------------------------
# Entity extraction type selection: opencalais or alchemyapi or none
# ("opencalais" has a much higher limit than "alchemyapi" (1000/day) so is recommended for free use
#  "alchemyapi" extracts sentiment, "opencalais" extracts entity associations Note this can be overridden per source)
extractor.entity.default=
# Text extraction type selection: boilerpipe or alchemyapi or none
# ("alchemyapi" is much better, but has the limit discussed above. Note this can be overridden per source)
extractor.text.default=

1.14 Interface Related Properties

The ui.end.point.url property is used to tell the UI where to connect to the Infinit.e API.

#-------------------------------------------------------------------------------
# 1.14] Interface Related Properties for the AppConstants.js file found in:
#       /mnt/opt/infinite-tomcat/interface-engine/webapps/ROOT/
#-------------------------------------------------------------------------------
# The REST end point of the server (or a DNS/AWS load balancer across multiple rest end points):
# (Will normally end "/api/") 
ui.end.point.url=http://MY_REST_ENDPOINT/api/
#You can use this to set a different logo
#ui.logo.url=

1.15 Maps API key

2. Properties that can normally be left at their default

2.1 Basic Infinit.e Settings

2.2 Software as a Service Properties

Properties that are only modified if Infinit.e is deployed in SAAS mode (which is uncommon).

#-------------------------------------------------------------------------------
# 2.2] Software as a service (SAAS) settings
#-------------------------------------------------------------------------------
# If true, allows admin requests that come from trusted sources to have admin privileges: 
app.saas=false
# A list of trusted DNS/IP addresses (eg from CMS):
app.saas.trusted.dns=

2.3 Amazon Services Properties

2.4 EMail Server Settings

2.5 Email Addresses for Log Files

2.6 API Settings

Default search test terms and expected results values used to monitor the Infinit.e service.
The ability to reduce the memory consumption by manually calculating aggregations instead of using Lucene facets (slightly slower and less accurate).

#-------------------------------------------------------------------------------
# 2.6] API Settings
#-------------------------------------------------------------------------------
# API Search Test Terms and Expected Results
# List of terms formatted like: "*" "something" "something":
# (The continuous testing randomly selects one of these for querying the API)
api.search.test.terms="*"
# The expected results (max 100), if a different number comes back, the system alerts:
api.search.expected.results=0
# For smaller clusters with many documents, Lucene faceting may use too much memory, if so
# set this parameter to "low". ("none" disables all aggregations, "full" uses Lucene facets where possible)
api.aggregation.accuracy=full

2.7 Amazon AWS Settings

Property used by s3cmd to connect to Amazon to move files around.

#-------------------------------------------------------------------------------
# 2.7] Amazon AWS Settings
#-------------------------------------------------------------------------------
# Used for s3cmd, see their web page for details: 
s3.gpg.passphrase=

2.8 MongoDB Properties

MongoDB database configuration properties.

#-------------------------------------------------------------------------------
# 2.8] MongoDB Properties
#-------------------------------------------------------------------------------
# (server/port should normally point to localhost:27017), where API nodes have a mongos
db.server=localhost
db.port= 27017
# db.sharded - 0 = false and 1 = true
db.sharded=1
# The max number of documents to store (eg 10M). Docs will be dropped in order of age.
# (Not currently supported):
db.capacity=10000000
# MongoDB config server or servers (must be 1 or 3 comma separated IPs), non-EC2/AWS installations only
db.config.servers=
db.replica.sets=
#----------------------------------------------
# db.cluster.subnet - used for non-EC2/AWS only installations to help mongodb configurations
# identify proper host ip addresses, e.g. 127.0.0.
db.cluster.subnet=
#----------------------------------------------
# The location from which to fetch the geo.bson dump used for feature.geo
# can start s3://, http:// or https://, else is assumed to be a file, eg
#db.geo_archive=s3://config.saas.infinite.ikanow.com/geo.bson.tar.gz
# Can always be retrieved here
db.geo_archive=http://yum.ikanow.com/infinit.e-preinstall/artifacts/geo.bson.tar.gz

2.9 Access controls

#-------------------------------------------------------------------------------
# 2.9] UI inactivity timeout (in seconds)
#-------------------------------------------------------------------------------
# After this many seconds of inactivity, users are logged out from their Infinit.e session
access.timeout=1800
# This is a regex that, if specified, will allow only access to REST commands matching
# the pattern - only applied to remote clients. Connections from localhost always have access 
# to everything
# Eg the commented out example will allow only login/keepalive and querying.
#remote.access.allow=^/api/(knowledge/document/query|auth/login|auth/keepalive)
remote.access.allow=
# This parameter does the opposite, allows everything except specified commands
remote.access.deny=
# Uncomment ui.logging, set to true: in order to log any success:false or HTTP errors from the API, except:
# If you specify the ui.logging.api.regex, it should start with either '*' or '!' and then the regex:
# if it starts with '*' then it will log any api call with a substring matching the regex (in addition to all errors)
# if is starts with '!' (or nothing at all) then it will show errors only, and filter them according to the regex
#ui.logging=true
#ui.logging.api.regex=

2.10 Elasticsearch Properties

#-------------------------------------------------------------------------------
# 2.10] Elasticsearch Properties
#----------------------------------------------
# Discovery mode = ec2 (if running on AWS) or zen (specify a list of IPs below):
elastic.node.discovery=ec2
#----------------------------------------------
# ES nodes, e.g.: elastic.search.nodes='NODE1:9300','NODE2:9300','NODE3:9300':
# Needed if discovery.mode=zen (not EC2/AWS), a set of IPs to try (>= 1 must be running elasticsearch)
elastic.search.nodes=
#-------------------------------------------------------------------------------
# mlockall = should equal true except if running on a machine with < 4GB of RAM
bootstrap.mlockall=true
# (Should normally be localhost:9300, unless an API node is running with no index node) 
elastic.url=localhost:9300
# If you know your logstash and elasticsearch versions are matched up, you can improve the output performance by setting this to "binary"
elastic.logstash=http

2.11 Harvester Properties

#-------------------------------------------------------------------------------
# 2.11] Harvester Properties
#-------------------------------------------------------------------------------
# Comma-separated-list from File,Database,Feed (note Database and Feed need jars not bundled with the RPM)
harvester.types=File,Database,Feed
# Web crawling etiquette: the time to way between consecutive accesses to the same time (10s is standard)
harvest.feed.wait=10000
# The minimum time between consecutive harvests (avoids thrashing FS/DB/RSS when there's nothing to get)
harvest.mintime.ms=300000
# The minimum time between consecutive source harvests (set if needs to be longer than harvest.mintime.ms,
# eg if you want to pick up a source quickly the first time but then not update so frequently)
harvest.source.mintime.ms=
# Restricts the number of docs that can be harvested per cycle for memory reasons:
harvest.maxdocs_persource=5000
# Threading configuration type:num_threads (type from above):
# (eg for RSS heavy increase the "feed", for DB heavy increase the "file" etc. Beyond 20 there is limited benefit). 
harvest.threads=file:5,database:5,feed:20
# This controls the batch size of sources picked up by a thread, this does not normally need to be changed (its default is shown)
# (It can be reduced in cases where a small number of very long-running sources need to be harvested).
#harvest.distribution.batch.harvest=20
# This disables entity and association aggregation. For almost all applications you will not want to set this.
#harvest.disable_aggregation=false
# This controls what % of 1 CPU is used to update entity and association counts and synchronization, shouldn't need to change it
# (Reducing it will speed up raw harvest speeds, Increasing it will keep entity etc freqs more up-to-date)
#harvest.aggregation.duty_cycle=0.5
# This parameter uses the Java Security Manager to prevent scripts accessing local network services (at the expense of some performance)
# It can be turned off for uses of the platform where sources must be approved before being added (etc)
harvest.security=false
# This is a comma-separated list of hosts in the following format "http://<HOST>[:<PORT>]" or "socks://<HOST>:<PORT>"
# When specified, all requests for external content from the harvester are proxied (round-robin) through the specified hosts
harvest.proxy=
# Content controls:
#This is the maximum size of content (before gzip) that will be stored (truncated above this)
#store.maxcontent=16000000
#If true (default false), stores the raw content of a document (as well as the post-processed text)
#store.rawcontent=false
#If true (default false), then the doc_content.gzip_content also contains the JSON metadata, stored as a string 
#store.metadata_as_content=false

2.12 Hadoop Properties

The Hadoop config path is a local folder where Infinit.e stores map reduce jobs if Hadoop is used.

#-------------------------------------------------------------------------------
# 2.12] Hadoop Properties
#-------------------------------------------------------------------------------
hadoop.configpath=/mnt/opt/hadoop-infinite/mapreduce/

2.13 Entity Extractor Properties

 #-------------------------------------------------------------------------------
# 2.13] Entity Extractor Properties
#-------------------------------------------------------------------------------
# Alchemy extraction level
# 1==people postproc, 2==geo postproc, 3==both
# (This uses some hard-coded heuristics to work around known AlchemyAPI errors)
app.alchemy.postproc=3
#This should list any custom extractor class names
#extractor.entity.custom=

2.14 UI Related Properties

#-------------------------------------------------------------------------------
# 2.14] Interface Related Properties for the AppConstants.js file found in:
#       /mnt/opt/infinite-tomcat/interface-engine/webapps/ROOT/
#-------------------------------------------------------------------------------
# For SaaS applications, the URL of the web page (eg containing CMS links for forgot password/logout etc):
# (Can be left blank otherwise)
ui.domain.url=
# Forgot password URL: (SaaS only, ie integrated with a CMS)
# (relative to ui.domain.url):
ui.forgot.password=forgot-password/
# Logout URL: (SaaS only, ie integrated with a CMS)
# (relative to ui.domain.url):
ui.logout=?action=logout
# If using a different search engine than google (eg something internal), set this
#ui.externalsearch.name=Google
#This URL has the search terms appended at the end (URL encoded)
#ui.externalsearch.url=https://www.google.com/search?hl=en&q=

2.15 Map API Key

Obsolete: Google has ceased support for this API and is not generating any new keys. However the MapQuest map widget requires the key be set to a non-zero string (the commercial version which is not used in this tool but could be requires a key).

#-------------------------------------------------------------------------------
# 2.15] Maps API key:
#-------------------------------------------------------------------------------
# Can be any non-zero string for MapQuest open API
google.maps.api.key=NULL_KEY

2.16 Enterprise variables

These are parameters required for the enterprise version, and can be ignored (ie should be left commented out) in other cases

#-------------------------------------------------------------------------------
# 2.16] Enterprise variables - ignore for non-enterprise installs
#-------------------------------------------------------------------------------
 
#####
# Proxy settings for port 8090
# The location of the case manager GUI - Reverse Proxies requests to <domain>:8090/casemanager
# If the case manager server is located on a different domain name, change casemanager.master to be the domain name of that server.
casemanager.master=localhost.localdomain
# The location of the splunk monitor server - Reverse Proxies requests to <domain>:8090/splunk
# If the splunk server is located on a different domain, change monitor.master to be the domain name of that server.
monitor.master=localhost.localdomain
 
####
# Enterprise settings need by the GUI
# The location of the case manager app server - set to "local" to point to <domain>/caseserver
# NOTE THAT THE DOMAIN MUST MATCH THE URL OF THE MAIN GUI OR AUTHENTICATION WON'T WORK
#app.case.api.url=local
# The location of the case manager GUI - set to "local" to point to <domain>:8090/casemanager
# (redirects to 8090/casemanage)
# NOTE THAT THE DOMAIN MUST MATCH THE URL OF THE MAIN GUI OR AUTHENTICATION WON'T WORK
#app.case.url=local

#-------------------------------------------------------------------------------