Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We provide a template for this (TODO linkhere), though actually the AWS management console interface is just as good, the only custom parameter is the health check target, which should be set to "HTTP:80/api/auth/login/ping/ping".

...

The following fields must be populated:

...

  • ClusterName: the cluster name, should match the "infinit.e.configuration.properties" file.
  • IsConfigSvr: should be set to "1" for the first node created, "0" after that (for combined config server/DB scenarios only
  • ReplicaSetId: For unsharded deployments (as set in "infinit.e.configuration.properties"; almost certainly what you will be running), just leave as 1 all the time. For sharded deployments, use "1" for the first 2 nodes, "2" for the second 2 nodes, etc.
  • NodeName: The name displayed in the EC2 instances
  • ConfigFileS3Path: the location of the "infinit.e.configuration.properties" file in your S3 storage.
  • AwsAccessId: The AWS ID/Access Key for your Amazon account.
  • AwsAccessKey: The AWS Key/Secret Key for your Amazon account.
  • AvailabilityZone: Must be consistent with the availability zone from which the stack was launched (top left of CloudFormation tab)
  • SecurityGroups: Set the security group from Step 1.
  • KeyName: Set the key from Step 1.

...

  • InstanceType: Defaults to "m1.xlarge", which is what you want for any decent sized deployment; use "m1.large" for test/demo clusters. Note that if "m1.xlarge" then RAID is automatically installed on startup (which takes about 5 10 minutes).
  • IsStorageNode: (leave as 1).

Note that in practice you will probably want to override the default templates, so that standard fields like ClusterNameClusterName (unless you have multiple clusters in the same AWS account), ConfigFileS3Path, AwsAccessId, AwsAccessKey, AvailabilityZone, SecurityGroups and KeyName (ie basically everything!) are set to default parameters and can normally be ignored.

...

Step 4 - Scenario 2: Standalone config servers

TODO

Step 5: Start API nodes

...

First start the 1/3/5 config servers. This will require the same steps as above except:

  • IsConfigSvr should be "0', IsStorageNode "1",
  • ReplicaSetId can be ignored,
  • InstanceType should be "m1.large".

(Alternatively used the "DB Config Server" template provided).

Then start the main DB nodes, again just as above, except:

  • IsConfigSvr should be "0".

Step 5: Start API nodes

The API nodes can then be started. It is difficult to provision in advance the number of nodes because it heavily depends on usage patterns and sort of documents being indexed. It is therefore recommended to start with 2 and add new ones if response times are too long.

To create a new API node, follow the usual steps: navigate to the "CloudFormation" tab, select "Create New Stack", upload/link to the API template, select a "Stack Name" and then "Next" to the configuration parameters.

The following fields must be populated:

  • ClusterName: the cluster name, should match the "infinit.e.configuration.properties" file.
  • NodeName: The name displayed in the EC2 instances
  • ConfigFileS3Path: the location of the "infinit.e.configuration.properties" file in your S3 storage.
  • AwsAccessId: The AWS ID/Access Key for your Amazon account.
  • AwsAccessKey: The AWS Key/Secret Key for your Amazon account.
  • AvailabilityZone: Must be consistent with the availability zone from which the stack was launched (top left of CloudFormation tab)
  • SecurityGroups: Set the security group from Step 1.
  • KeyName: Set the key from Step 1.

The following fields are populated sensibly by default, but can be changed:

  • InstanceType: Defaults to "m1.xlarge", which is what you want for any decent sized deployment; use "m1.large" for test/demo clusters. Note that if "m1.xlarge" then RAID is automatically installed on startup (which takes about 10 minutes).

As with the DB nodes, in practice you will probably want to override the default templates, so that standard fields like ClusterName (unless you have multiple clusters in the same AWS account), ConfigFileS3Path, AwsAccessId, AwsAccessKey, AvailabilityZone, SecurityGroups and KeyName (ie basically everything!) are set to default parameters and can normally be ignored.

The same comments as for the DB node about using CloudFormation somewhat sub-optimally also hold. It is particularly noticeable for API nodes because it results in one final step, discussed in the next section.

Step 6: Connect the API nodes to the load balancer

TODO

Miscellaneous notes

TODO What to do next

TODO note that we're not using the cloudformation templates quite like they're supposed to be ysedThis is performed in standard fashion:

  • Navigate to the EC2 tab in the AWS management console
  • From the navigation sidebar, "NETWORK & SECURITY" > "Load Balancers"
  • Select the desired load-balancer
  • Press the green "+" in the top right of the "Instances" tab
  • Select the node based in it's "NodeName" (shown in brackets next to the instance ID). 

Miscellaneous notes

You now have a fully operational Infinit.e cluster. Start adding sources and you can begin analysis. This link provides a quick example of getting a source imported in order to test/demonstrate the GUI.

It takes about 20 minutes for a node to come online following start-up. Most of this time (10-15 minutes) is spent updating the packages (like Java and JPackage) from the defaults on the CentOS 5.5 AMI. Therefore the time-to-start could be significantly improved by building a new custom AMI, starting from the base AMI, installing infinit.e-prerequisites-online RPM, and then creating the new AMI.