Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that sharding is not fully supported (or at least not fully tested) as of the March 2012 release. Apart from one weekly maintenance script (that is awaiting a new MongoDB) feature, we believe it should work. As of 3M documents indexed, sharding is not necessary in case. Assuming sharding is enabled, the top-level design page explains how the system scales.

As an alternative to load balancers, DNS round robin load balancing (using Amazon's Route 53) has also been tested and works well.

The remaining sections describe the different steps necessary to get up and running. Note that steps 3-5 can be performed interchangeably, and it is not necessary to finish one step before starting the next. Also API nodes can be added to the load balancer before they are complete (they will appear as out of service until the system is working).

...

A single file is used to populate the configuration files for all the custom and standard technologies used in Infinit.e: "infinit.e.configuration.properties". A template for this file can be obtained here.

A full description of the fields within "infinit.e.properties.configuration" is provided here, but the EC2-specific automated configuration makes populating it considerably easier than in the general case. The remainder of this section describes the EC2-specific configuration.

...

Step 3: Start a load balancer

We provide a template for this (here), though actually the AWS management console interface is just as good, the only custom parameter is the health check target, which should be set to "HTTP:80/api/auth/login/ping/ping".

Using the template, the display name cannot be changed, which is irritating but not that important.

To start using the template:

...

Info

Amazon Elastic Load Balancers have non-configurable timeouts (eg 60 seconds). This can cause problems to some of the Infinit.e operations, such as testing and deleting sources and documents.

You can request Amazon to increase the timeout on their EC2 forums, and they will normally do it within a day or 2. Example forum post I made.

An alternative is to use the load balancer only to provide automated health-checking of the API, eg and to use Amazon's DNS service, Route 53, for round-robin load balancing (delegating the "rr" subdomain of ikanow.com: useful link).

We provide a template for this (here), though actually the AWS management console interface is just as good, the only custom parameter is the health check target, which should be set to "HTTP:80/api/auth/login/ping/ping".

Using the template, the display name cannot be changed, which is irritating but not that important.

To start using the template:

  1. Navigate to the CloudFormation tab in the AWS management console.
  2. Select "Create New Stack"
  3. Either upload the template (if you've modified it) via "Upload a Template file" or specify in "Provide a Template URL".
  4. Select a "Stack Name" and click Next/Finish where prompted.
  5. The Load Balancer URL can be found either from the "Output" tab in CloudFormation or from the EC2 tab, then the navigation bar "NETWORK & SECURITY" > "Load Balancers".

Note that while it would have been nice to have API nodes automatically connect themselves to the Load Balancer on start, this is not currently possible with CloudFormation except via AWS "Auto Scaling", which does not have a manual override (and also does not map well onto resource provision in Infinit.e).

Enabling application-based stickiness for cookies is recommended (use "infinitecookie"), otherwise you can suffer "random" log-outs from some of the JSP-based utility GUIs.

Step 4: Start database nodes

Step 4: Start database nodes

The precise steps vary depending on how the config server node is deployed:

...

As for the load balancer, navigate to the "CloudFormation" tab, select "Create New Stack", upload/link to the template (single node or replica pair TODO LINKS), select a "Stack Name" (for display only) and then "Next" to the configuration parameters.

...

First start the 1/3/5 config servers. There are specific templates for a single (TODO LINK) or three-node configurations (the 5-node case is an easy tweak to the existing template, if needed). The config server parameters are the same as DB but without the unnecessary ReplicaSetIds, IsConfigServer, IsStorageNodetemplate, if needed). The config server parameters are the same as DB but without the unnecessary ReplicaSetIds, IsConfigServer, IsStorageNode.

The config server Cloudformation template also creates a DNS entry in Route53 for a user-specified Hosted Zone. This is necessary because of a bug in MongoDB where changing the hostname of a config server (eg because the EC2 instance becomes unstable so a new node must be created) requires a complete cluster restart (in order: shutdown API nodes, DB nodes, config nodes; startup config nodes, DB nodes, API nodes). The DNS entry is written into the EC2 metadata in the "DnsName" field.

The only other differences is that InstanceType is one of "t1.micro" or "m1.large". The micro instance should be fine in most cases (and is >10x cheaper).

...

  • IsConfigSvr should always be "0", otherwise system-wide problems will occur.
  • DnsName should be present, unique, and point via CNAME to the actual hostname, otherwise system-wide issues may occur

Step 5: Start API nodes

The API nodes can then be started. It is difficult to provision in advance the number of nodes because it heavily depends on usage patterns and sort of documents being indexed. It is therefore recommended to start with 2 and add new ones if response times are too long.

...

You now have a fully operational Infinit.e cluster. Start adding sources and you can begin analysis. This link TODOLINK provides a quick example of getting a source imported in order to test/demonstrate the GUI.

...