Infinit.e - downloading, installing, and maintaining the platform

Overview

All of the install software has been written for CentOS 5.5-5.6/RedHat 5.5 and the online installers work with CentOS 6/Redhat 6/Amazon Linux. We plan in the future to build install artifacts for Ubuntu also: it is possible that doing so is as simple as converting the RPMs to DPKGs, though we have not yet looked into this. We are still working on the offline install for CentOS 6, how to get it working with one quick change is described below under "Quick Install - offline" (all that is necessary is bundling up all the RPM dependencies, we have deployed to Redhat 6.3 via the offline install, with a few tweaks that just need to find their way into the trunk).

We recommend installing Infinit.e on a 64b OS - but it will run on a 32b OS with one change (see troubleshooting guide, under installation issues). We also recommend running with at least 4GB of RAM and 50GB of disk space (though it is possible with less - again see the troubleshooting guide for details; conversely for real deployments more will normally be required, approximately 5x the raw storage requirements - eg 5M documents of average size 10KB can require up to 250GB of disk space between the database and full text index).

If you are running with <=4GB of RAM then you will need to ensure swap is enabled.

Currently we have only tested Infinit.e with selinux disabled. It is highly likely to work in permissive mode also. A very quick test with selinux enforced showed minor problems with tomcat which we will address shortly.

Although the Infinit.e software has been designed to run in many different configurations, most of the testing has occurred in one of three configurations:

  • In the Amazon EC2 cloud, with 2 node types (API node and DB node, see here for a logical mapping of the components onto the node types). There is explicit support for Amazon EC2, described below.
  • On other Internet-connected machines, either with 2 node types as above, or single-machine instances running both API and DB functions.
  • On non-Internet-connected machines, with both 2-node and single-instance configurations.

The remainder of this section provides further details on download, installing, reconfiguring, and updating the platform in each of the above cases. Note that there is a completely generic installation guide (making no assumptions) here.

Important Note #1: Do not install Infinit.e on a system that already has Tomcat installed and running. Infinit.e has a platform specific Tomcat installation and configuration process. If an instance of Tomcat is already on the server it will almost certainly cause conflicts with the Infinit.e applications.

Important Note #2: This install guide assumes that either:

  • The entire filesystem is a single large partition mounted on "/".
  • The "/mnt" or "/mnt/opt" directory is a separate large partition.

For example, all of the logs, databases (MongoDB generates a 20GB set of pre-allocated files on startup), and indexes are stored off "/mnt/opt".

If one of the assumptions is not met, and a different directory, which we'll call "/DIR", is where the main data partition is, then before installing the software, the following additional step must be performed:

  • ln -s /DIR /mnt/opt

In addition, "/opt" must exist, and "/opt" and "/mnt/opt" must not have permissions that are too restrictive (777 works, eg "chmod 777 /mnt/opt /opt")

There are some other top-level directories that are reserved: "/raidarray", "/dbarray", "/persistent". These are not necessary for normal installs but for higher performance:

  • "/raidarray" will be used for data/performance intensive directories by all RPMs, and can be set up with noatime and a larger block size (only use a larger block size if /dbarray is present or on API only nodes)
  • "/dbarray" is used to house the database data and can be set up with noatime and a smaller block size
  • "/persistent" is for Amazon installs and can usually be ignored.

A knowledge-based troubleshooting guide to common installation/configuration problems is here.

Quick install options

The Ikanow community downloads page includes links to quick install bundles for single-node installs.

Quick install - online

Install the online RPM (infinit.e-install.v0.1-online.noarch.rpm) and then run "sh /opt/infinite-install/infinite-install.sh". There are only two yes/no options during the install (plus needing to accept Oracle's ToS for Java), whether to use the sample configuration or whether to build your own, and whether to pre-load some sample data.

To download the install RPM from the IKANOW website directly onto a Linux machine, perform the following steps:

  1. Click the "Download" (red) button for the online "enterprise linux" install package (from http://www.ikanow.com/downloads/), enter the required fields, and select "Download" again (blue button)
  2. Download the RPM file to the local machine
  3. From the linux machine type "curl -L -o install.rpm ''https://ikanow.jira.com/builds/artifact/INF-PREOSS/JOB1/build-latestSuccessful/Infinit.e-Install-RPM/XXX'" where XXX is the name of the file you are prompted to download in step 2
    1. (of course you can also just transfer the file from Step 2)
  4. Then run (sudo) rpm -i install.rpm and follow the instructions

If you use the sample configuration, the user/password is infinite_default@ikanow.com / infinit.e!2013. These can (and should!) be changed from the manager webapp.

Quick install - offline

This is for installations where no Internet connection is available. 

The offline install is only available for 64 bit architectures.

Download both tarballs from the community downloads pages and transfer them to the end computer(s). Untar them both and then run (sudo if non-root) "sh install-infinite-offline.sh" in the install directory. This provides a single option - whether to use the sample configuration or whether to build your own. The same user/password is the same as above. No sample data is available in the offline mode.

Note that there is one limitation to the RedHat/CentOS 6 support: the Hadoop installer is not supported. This is not a massive issue since it was purely a wrapper for Cloudera's online installer. Currently, for offline RedHat/Centos 6 installs, you must download and deploy the Cloudera or HortonWorks install packages yourself.

Note that Flash Player 11.x is not bundled with Infinit.e, and is required to use the GUI locally (if there are machines on the same network that already have Flash Player 11.x then this may not be necessary). Standalone Flash Player plugins maybe downloaded from here.

Quick install - Amazon AWS cloud

There are 2 options:

Downloading the software

The binaries can be downloaded in one of two different ways:

[ikanow_prereqs]
name=Infinit.e Pre-Reqs Repository
baseurl=http://yum.ikanow.com/infinit.e-preinstall-repo
gpgcheck=0

(also hosted here)

[ikanow]
name=Infinit.e Repository
baseurl=http://yum.ikanow.com/infinit.e-install-repo
gpgcheck=0

(also hosted here)

Once these are copied to the "/etc/yum.repos.d/" directory as eg "ikanow.repo" and "ikanow-infinite.repo", then commands like:

yum install --nogpgcheck infinit.e-platform.prerequisites.online
yumdownloader infinit.e-platform.prerequisites.offline

Will download/install the RPMs.

Note that some optional modules are not bundled with the software:

  • An optional package called Splunk is used for application monitoring. It is commercial but has a free license that is sufficient for most cases. Installation of Splunk is discussed in the infinit.e-config section.
  • Many JDBC libraries for commercial databases cannot be bundled and must be downloaded and installed by hand. This is discussed here.

Creating the configuration file

The most important step after downloading the software and installing the platform pre-requisites file, but before installing any other RPMs, is to create a configuration file. This is described here.

Installing the software - EC2

EC2 installation and configuration has been explicitly supported in 4 ways:

  1. The platform pre-requisite RPMs were specifically designed to run on top of Amazon Linux. We provide an AMI ("ami-69a62000") that is identical to Amazon Linux with 2 minor changes: extra transient storage has been attached (for some reason this is not configured by default), and the "user-data" is executed on startup (to support our CloudFormation templates). 

    • Note that in no way is our AMI needed to make Infinit.e function, we just provide it for convenience.
    • (There's also an old CentOS 5.x AMI: "415580616905/CentOS_5.5_X86_64_V6_GOLD" since when we first developed Infinit.e CentOS AMI availabiliy was limited.)
  2. We provide CloudFormation templates to allow easy creation of API nodes and DB nodes (and also a load balancer)
  3. We use the EC2 instance metadata for various discovery and configuration tasks, making filling in the properties form much simpler.
  4. EC2-specific backup to S3 storage is supported.

The full EC2 installation guide is here.

Installing the software - Internet connected

Follow the generic installation guide, using the online RPMs where there is an option.

Installing the software - offline

Follow the generic installation guide, using the offline RPMs where there is an option.

Note that Flash Player 11.x is not bundled with Infinit.e, and is required to use the GUI locally (if there are machines on the same network that already have Flash Player 11.x then this may not be necessary). Standalone Flash Player plugins maybe downloaded from here.

Installing the software - virtualization

By following the generic installation guide, the software can be deployed in standard virtual machines (this has been tested on VMWare and VirtualBox, though should work on any others also).

A sample VirtualBox VM (currently running the May 2014 build, updating is easy and described below) is available on the IKANOW downloads page.

Note that the harvester is turned off by default to enable the VM to run on machines with lower memory footprints. If you have 4GB+, the harvester can be started from the command line as root by "service infinite-px-engine start"

Note that the GUI of the downloadable VM is currently only accessible from the host machine, it will not run across a network.

Note that the OS of the VM is running an older version of Firefox, which is not compatible with the IKANOW UI. It is possible to upgrade to a later version, though the intention is more that the host PC's browser be used via port forwarding or network bridging.

Changing the configuration after installation (starting/stopping Infinit.e applications)

Simply edit the "infinite.configuration.properties" file (in "/opt/infinte-install/config") described here, and then run the following script: "sh /opt/infinite-home/scripts/rewrite_property_files.sh". Any changes to "gui." properties on nodes with an Interface Engine installed also requires running "sh /opt/tomcat-infinite/interface-engine/scripts/create_appconstants.sh". There may be some additional steps that are required:

  • If any API or GUI configuration has been changed, it is necessary to restart the interface engine: "service tomcat6-interface-engine restart"
    • (this service command can also be called with "start" and "stop" - this is the case for all service commands below)
  • It is not normally necessary to restart the harvester, it will pick up configuration changes at the start of each cycle.
    • If necessary, it can be controlled with "service infinite-px-engine restart"
  • If any of the "elastic." configuration parameters have been changed (this is very unusual), it is necessary to restart the the index engine: "service infinite-index-engine restart"
    • If running on a multi-node cluster you should staggered the restarts to ensure availability is maintained, eg restart on node 1, wait 5 minutes, restart on node 2 etc etc
  • If any of the "db." configuration parameters have been changed (this is very unusual), it is necessary to restart the the database: "service mongo_infinite restart"
    • If running on a multi-node cluster you should staggered the restarts to ensure availability is maintained, eg restart on node 1, wait 5 minutes, restart on node 2 etc etc

There is an exception: some "beta" or "internal" configuration parameters are ignored by the distribution scripts (the configuration infrastructure is in need of an overhaul...). These need to be inserted into "/opt/infinite-home/config/infinite.api.properties.TEMPLATE" or "/opt/infinite-home/config/infinite.service.properties.TEMPLATE", in which case the changes are copied correctly when the service-specific configuration files are rebuilt.

Controlling the processing engine

The "processing engine" (which comprises the harvester and the custom processing engine) can be configured dynamically by creating (empty) files in "/opt/infinite-home/bin/":

  • STOPFILE: stops the harvester at the end of it's current cycle, until it is manually restarted
    • (note a restart occurs Sunday at midnight as part of a batch process - ALLSTOPFILE is needed to prevent this from restarting the harvester)
  • ALLSTOPFILE: stops the harvester at the end of it's current cycle (restart commands are ignored)
  • STOP_CUSTOM: prevents any custom jobs from running (saved queries and Hadoop jobs)
  • Less common:
    • RESET_FILE: Will reset any sources that have been disabled due to errors
    • SYNC_FILE: This happens hourly anyway - initiates a comparison between the recent additions to the DB and text indexes, deleting mismatches, unless the next file is present. Now turned off by default, see next bullet.
    • STOP_SYNC_FILE: This disables DB/index synchronization - now off by default, since it has not proven very useful. 

Upgrading the software - Internet connected

The RPMs are usually updated monthly. The GitHub wiki contains details of the latest release and what changes were made.

It is rarely necessary to update either "db-instance" or "infinite-index-engine" RPMs - the wiki will indicate when this is the case. The wiki will also indicate when a change is non-backwards compatible and provide full details on the upgrade process in those cases.

So normally the following commands can be used to update the system:

  • yum update infinit.e-config infinit.e-processing-engine infinit.e-interface-engine infinit.e-record-engine
    • (Or "yum update infinit.e-config infinit.e-processing-engine infinit.e-interface-engine" if running pre-v0.3 or v0.3 with an older version of elasticsearch)

If the full upgrade is necessary (again: the GitHub wiki will indicate if this is necessary - normally it is not):

  • Stop the services on each machine in a cluster:
    • service tomcat6-interface-engine stop
    • service infinite-px-engine stop
    • service infinite-index-engine stop
    • service mongo_infinite stop
  • yum update infinit.e-config infinit.e-db-instance infinit.e-index-engine infinit.e-processing-engine infinit.e-interface-engine infinit.e-record-engine
    • (Or "yum update infinit.e-config infinit.e-db-instance infinit.e-index-engine infinit.e-processing-engine infinit.e-interface-engine" if running pre-v0.3 or v0.3 with an older version of elasticsearch)
    • The RPM updates will automatically start/restart all required processes.

Note that whenever the version of the elasticsearch RPM changes (again: this is always noted in the GitHub wiki), it is necessary to flush the index before restarting the index engine, eg:

  • service infinit.e-px-engine stop
  • curl 'localhost:9200/_flush'
  • service infinit.e-index-engine restart

Failure to do this will corrupt the cluster, which is a Bad Thing.

Upgrading the underlying technologies is described here: MongoDB, Elasticsearch, Java, Tomcat.

Upgrading the software - offline

As above, except use "yumdownloader" to obtain the RPMs (or obtain the second tar file from the downloads page) and then transfer them to the machine, then use "rpm -U" or "yum --disablerepo=* localupdate" to install.

 

Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons