Infinit.e System Requirements

Introduction

The following document describes the minimum recommended system requirements for installing Infinit.e in a production environment.

When provisioning a cluster to be used with a NoSQL system like Infinit.e, it is important to try to balance disk IO with CPU speed with memory bandwidth. A "super computer" with 10s of cores and 100GB of memory is no use if it only has one 2x RAID array.

In our (empirical/anecdotal) investigations to date, we have found the sweet spot to be 4-8 cores per 16-64GB of memory per 4x RAID (eg 8x RAID-10 or 4x RAID-0).

(Note: we have not yet investigated how to use SSD to improve imperformance - this is on our near term roadmap)

Server Operating System

Infinit.e is currently tested to run on the following operating systems:

  • CentOS 5.5, 5.6, 5.7, 5.8, 6.2, 6.3
  • Redhat 5.5, 5.6, 5.7, 5.8, 6.2
  • Amazon Linux: 12.08+

Selinux is not supported.

The Infinit.e software and scripts will run on Debian Linux versions such as Ubuntu, but there are not currently any install packages, so the system must be set up by hand. The software will run on Windows, but this is only suitable for testing because the scripts that control the overall platform are Linux-specific.

Server Hardware

The minimum required hardware depends on the volume of data that will be ingested. Note that new nodes can always be added to scale in either storage or performance.

Similarly, lower spec configurations (or combining API and Database nodes) will usually work but will start to provide significantly degraded performance (and of course no more data can be ingested once the storage space runs out).

The recommended minimum hardware for different scenarios is described below.

We refer to "document" as a catch all for database record, Web page, PDF/office document, XML document etc. The figures below are for some "average" document across all those types (say 5KB in size) ... if most documents ingested are smaller (eg DB records) then the capacity/performance will be higher and conversely if most documents are larger (eg complex pdf reports) then the capacity/performance will be lower.

In practice as you ingest data you should track disk usage against document size to get a more accurate picture of your own data (or just add lots more disk space than could possibly be needed and then monitor performance to decide when to scale).

Separately there is a different set of volumetrics associated with log records. The documents and record sizing combine linearly. The "per hardware" scaling factors are described below.

Demo configuration

For running in a VM on a laptop to demonstrate the tool. May become slow for more than 100-1000 documents, or a few hundred thousand records.

 Infinit.e API + DB Node
Processor 1x 1.8+ GHz CPU
Memory1 or 2 GB RAM (swap required to get up to ~8GB total)
NetworkWAN connection/none
Storage

20GB 

Compact configuration

A small deployment servicing a few thousand documents, or about 10 million records:

The following table lists the minimum recommended hardware configuration for one Infinit.e API and Database node.

 Infinit.e API + DB Node
Processor 1 X Dual/Quad Core 1.8+ GHz CPUs   
Memory4-8 GB RAM (swap required to get up to ~8GB total)
Network1x GigE LAN connection
Storage

10 GB Root/OS partition +
50 GB data partition  

Small configuration

The following configuration works quite acceptably on 500K-1M documents, or about 50 million records. The higher the spec, the faster the performance for a given number/size of documents. However this topology does not provide redundancy.

 Infinit.e API NodeInfinit.e Database Node
Processor 1-2 X Dual Core 1.8+ GHz CPUs    1-2 X Dual Core 1.8+ GHz CPUs 
Memory8-16 GB RAM (or more)8-16 GB RAM (or more)
Network2x GigE LAN connection2x GigE LAN connection
Storage

15 GB Root/OS partition +
50 GB data partition, RAID-0

(~10GB per 1 million "average" documents)  

15 GB Root/OS partition +
100 GB data partition, RAID-0

(~60GB per 1 million "average" documents)

Operational configuration

A 2x API node and 2x DB node deployment using the following hardware works very quickly on a 2M+ document deployment (eg 2M-5M is a good typical range), or about 100 million records. In general the system capacity scales fairly linearly with nodes (see below).

This is the minimum recommended operational configuration because it provides data redundancy across the nodes as well as separating the API and DB functions, which is important for performance.

 Infinit.e API NodeInfinit.e Database Node
Processor 2 X Dual Core 1.8+ GHz CPUs    2 X Dual Core 1.8+ GHz CPUs 
Memory16 GB RAM or more (32GB is ideal)16 GB RAM or more (32GB is ideal)
Network2x GigE LAN connection2x GigE LAN connection
Storage

20 GB Root/OS partition +
100+ GB data partition, RAID-0
(~10GB per 1 million "average" documents)  

20 GB Root/OS partition +
600+ GB data partition, RAID-0
(~60GB per 1 million "average" documents)

Note the API and DB scales per 2-node block, since the primary benefit of the second node is redundancy rather than performance - although it balances the reads somewhat (not the writes) so there is some (not 2x) performance gain within a replica set.

API nodes scale for records without requiring additional DB nodes.

So for example, each pair of 16GB API nodes provides capacity for approximately 100M records (~3M records/day with 30 retention).

And also, for example, Each 4-node combination of 2x 16GB API and 2x 16GB DB nodes provides for approximately 2M documents.

Required Open Source Software

The following open source software packages are an integral part of the Infinit.e platform:

  • Java JRE/JDK 6u30+ (current version = 6u31)
  • Apache Tomcat 6.X (current version = 6.0.35)
  • MongoDB 2.1+
  • elasticsearch 0.19+
  • (Hadoop CDH 5.3+ is not required but provide additional functionality when installed)
  • (Logstash 1.4+ is not required but provide additional functionality when installed)
Note: These packages can be installed as part of Infinit.e's installation packages or be preinstalled on a server.

Optional Open Source/Free Software

The Infinit.e platform is designed to use Splunk 4.1 for monitoring and reporting of log files. Splunk is a completely optional part of the platform.

Hadoop can be used for batched custom analytics, but is not required.

Client requirements

The Infinit.e GUI (not required for headless configurations) can run on any Linux (Ubuntu, Redhat 6+), Windows XP+, or Mac (10.6+) capable of running the following software:

  • Any browser capable of running Flash 11+ (eg Firefox 10+, IE 8+, Chrome 17+)
  • Flash 11+

Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons