Infinit.e EC2 Installation Guide

Overview

The following diagram (click zoom to expand) shows the recommended configuration for running multiple (ie 1+) clusters of multiple nodes (2+ but we recommend 4+: ie 2+ API nodes and 2+ DB nodes).

Note that sharding is not fully supported (or at least not fully tested) as of the March 2012 release. Apart from one weekly maintenance script (that is awaiting a new MongoDB) feature, we believe it should work. As of 3M documents indexed, sharding is not necessary in case.

Step 1: Configure AWS settings

There are 3 things that need to be done in the AWS management console to prepare for Infinit.e install:

Set up a (free) CloudFormation account (simply navigate to the CloudFormation tab in the management console and follow the instructions)
Set up security groups
Set up S3 storage for index and DB backups

Security groups

The only port that is needed is port 80, though ssh at least on authorized IP addresses is standard.

There is no functional need to separate out the different clusters into different groups, but there are obviously safety/security reasons, eg to stop someone logged in to cluster "X" to deliberately or inadvertently access the technology stack on cluster "Y".

So having one group per cluster that disallows internal traffic (eg 10.*.*.*) is probably desirable (note that nodes within a group have unrestricted access to each other, which is desirable).

An even stricter configuration would be to have 2 groups per cluster, one for API nodes and one for DB nodes, only allowing port 27017 and 27016 access between them.

S3 storage

Given a root S3 path (S3ROOT say), eg we might use "infinit.e-saas.ikanow.com" (which is entered into the properties.configuration file, see below), the following buckets are required:

mongo.<S3ROOT>: daily database backups, put in the same region as the cluster.
elasticsearch.<S3ROOT>: daily index backups, put in the same region as the cluster.
backup.mongo.<S3ROOT>: weekly database backups, put in a different region (and ideally country) to the cluster.
backup.elasticsearch.<S3ROOT>: weekly index backups, put in a different region (and ideally country) to the cluster.

It is also recommended to set up a folder for holding configuration files (eg the "infinit.e.properties.configuration" file described below), eg "config.<S3ROOT>". Both default DB and API node templates (see steps 4, 5) require such an S3 location to be specified.

Step 2: Create a properties.configuration file

A single file is used to populate the configuration files for all the custom and standard technologies used in Infinit.e: "infinit.e.configuration.properties". A template for this file can be obtained here. (TODOLINK)

A full description of the fields within "infinit.e.properties.configuration" is provided here, but the EC2-specific automated configuration makes populating it considerably easier than in the general case. The remainder of this section describes the EC2-specific configuration.

Generic parameters

################################################################################
# Amazon services properties
# If deployed on an EC2 cluster set this to 1:
use.aws=1
# This is the root s3 bucket name to be used for backups:
# The "s3.url" parameter corresponds to the "S3ROOT" described in "Step 1" above
s3.url=infinite.myorg.com

AWS access information

################################################################################
# Amazon AWS Settings
################################################################################
# AWS keys (only needed if use.aws=1)
aws.access.key=ACCESS_KEY
aws.secret.key=SECRET_KEY
# Used for s3cmd, see their web page for details
s3.gpg.passphrase=none

Obviously these should be set to your Amazon keys.

Cluster information

################################################################################
# Cluster name and URL
# Any unique name within the EC2 cluster/subnet: 
# eg infinite-cluster1
elastic.cluster=CLUSTER_NAME
################################################################################
# Discovery mode = ec2 (if running on AWS) or zen (specify a list of IPs below):
elastic.node.discovery=ec2
# (once "elastic.node.discovery" has been set to "ec2", "elastic.search.nodes" can be ignored - the discovery will happen automatically)
#elastic.search.nodes=
# Also these DB configuration params can be ignored:
 
################################################################################
# MongoDB Properties
#db.cluster.subnet=
#db.config.servers=
#db.replica.sets=

In EC2 mode, the "elastic.cluster" string must be the same for all nodes (API and DB) in the cluster. It controls three things:

It enables the API nodes to discover each other
It enables the DB nodes to discover each other
It enables the API nodes to discover their DB

Step 3: Start a load balancer

We provide a template for this (TODO link), though actually the AWS management console interface is just as good, the only custom parameter is the health check target, which should be set to "HTTP:80/api/auth/login/ping/ping".

Using the template, the display name cannot be changed, which is irritating but not that important.

To start using the template:

Navigate to the CloudFormation tab in the AWS management console.
Select "Create New Stack"
Either upload the template (if you've modified it) via "Upload a Template file" or specify TODOLINK in "Provide a Template URL".
Select a "Stack Name" and click Next/Finish where prompted.
The Load Balancer URL can be found either from the "Output" tab in CloudFormation or from the EC2 tab, then the navigation bar "NETWORK & SECURITY" > "Load Balancers".

Note that while it would have been nice to have API nodes automatically connect themselves to the Load Balancer on start, this is not currently possible with CloudFormation except via AWS "Auto Scaling", which does not have a manual override (and also does not map well onto resource provision in Infinit.e).

Step 4: Start database nodes

TODO

Step 5: Start API nodes

TODO

Step 6: Connect the API nodes to the load balancer

TODO

Miscellaneous notes

TODO What to do next

TODO note that we're not using the cloudformation templates quite like they're supposed to be ysed