Infinit.e System Requirements
Introduction
The following document describes the minimum recommended system requirements for installing Infinit.e in a production environment.
When provisioning a cluster to be used with a NoSQL system like Infinit.e, it is important to try to balance disk IO with CPU speed with memory bandwidth. A "super computer" with 10s of cores and 100GB of memory is no use if it only has one 2x RAID array.
In our (empirical/anecdotal) investigations to date, we have found the sweet spot to be 4-8 cores per 16-64GB of memory per 4x RAID (eg 8x RAID-10 or 4x RAID-0).
(Note: we have not yet investigated how to use SSD to improve imperformance - this is on our near term roadmap)
Server Operating System
Infinit.e is currently tested to run on the following operating systems:
- CentOS 5.5, 5.6, 5.7, 5.8, 6.2, 6.3
- Redhat 5.5, 5.6, 5.7, 5.8, 6.2
- Amazon Linux: 12.08+
Selinux is not supported.
The Infinit.e software and scripts will run on Debian Linux versions such as Ubuntu, but there are not currently any install packages, so the system must be set up by hand. The software will run on Windows, but this is only suitable for testing because the scripts that control the overall platform are Linux-specific.
Server Hardware
The minimum required hardware depends on the volume of data that will be ingested. Note that new nodes can always be added to scale in either storage or performance.
Similarly, lower spec configurations (or combining API and Database nodes) will usually work but will start to provide significantly degraded performance (and of course no more data can be ingested once the storage space runs out).
The recommended minimum hardware for different scenarios is described below.
We refer to "document" as a catch all for database record, Web page, PDF/office document, XML document etc. The figures below are for some "average" document across all those types (say 5KB in size) ... if most documents ingested are smaller (eg DB records) then the capacity/performance will be higher and conversely if most documents are larger (eg complex pdf reports) then the capacity/performance will be lower.
In practice as you ingest data you should track disk usage against document size to get a more accurate picture of your own data (or just add lots more disk space than could possibly be needed and then monitor performance to decide when to scale).
Separately there is a different set of volumetrics associated with log records. The documents and record sizing combine linearly. The "per hardware" scaling factors are described below.
Demo configuration
For running in a VM on a laptop to demonstrate the tool. May become slow for more than 100-1000 documents, or a few hundred thousand records.
Infinit.e API + DB Node | |
---|---|
Processor | 1x 1.8+ GHz CPU |
Memory | 1 or 2 GB RAM (swap required to get up to ~8GB total) |
Network | WAN connection/none |
Storage | 20GB |
Compact configuration
A small deployment servicing a few thousand documents, or about 10 million records:
The following table lists the minimum recommended hardware configuration for one Infinit.e API and Database node.
Infinit.e API + DB Node | |
---|---|
Processor | 1 X Dual/Quad Core 1.8+ GHz CPUs |
Memory | 4-8 GB RAM (swap required to get up to ~8GB total) |
Network | 1x GigE LAN connection |
Storage | 10 GB Root/OS partition + |
Small configuration
The following configuration works quite acceptably on 500K-1M documents, or about 50 million records. The higher the spec, the faster the performance for a given number/size of documents. However this topology does not provide redundancy.
Infinit.e API Node | Infinit.e Database Node | |
---|---|---|
Processor | 1-2 X Dual Core 1.8+ GHz CPUs | 1-2 X Dual Core 1.8+ GHz CPUs |
Memory | 8-16 GB RAM (or more) | 8-16 GB RAM (or more) |
Network | 2x GigE LAN connection | 2x GigE LAN connection |
Storage | 15 GB Root/OS partition + (~10GB per 1 million "average" documents) | 15 GB Root/OS partition + (~60GB per 1 million "average" documents) |
Operational configuration
A 2x API node and 2x DB node deployment using the following hardware works very quickly on a 2M+ document deployment (eg 2M-5M is a good typical range), or about 100 million records. In general the system capacity scales fairly linearly with nodes (see below).
This is the minimum recommended operational configuration because it provides data redundancy across the nodes as well as separating the API and DB functions, which is important for performance.
Infinit.e API Node | Infinit.e Database Node | |
---|---|---|
Processor | 2 X Dual Core 1.8+ GHz CPUs | 2 X Dual Core 1.8+ GHz CPUs |
Memory | 16 GB RAM or more (32GB is ideal) | 16 GB RAM or more (32GB is ideal) |
Network | 2x GigE LAN connection | 2x GigE LAN connection |
Storage | 20 GB Root/OS partition + | 20 GB Root/OS partition + 600+ GB data partition, RAID-0 (~60GB per 1 million "average" documents) |
Note the API and DB scales per 2-node block, since the primary benefit of the second node is redundancy rather than performance - although it balances the reads somewhat (not the writes) so there is some (not 2x) performance gain within a replica set.
API nodes scale for records without requiring additional DB nodes.
So for example, each pair of 16GB API nodes provides capacity for approximately 100M records (~3M records/day with 30 retention).
And also, for example, Each 4-node combination of 2x 16GB API and 2x 16GB DB nodes provides for approximately 2M documents.
Required Open Source Software
The following open source software packages are an integral part of the Infinit.e platform:
- Java JRE/JDK 6u30+ (current version = 6u31)
- Apache Tomcat 6.X (current version = 6.0.35)
- MongoDB 2.1+
- elasticsearch 0.19+
- (Hadoop CDH 5.3+ is not required but provide additional functionality when installed)
- (Logstash 1.4+ is not required but provide additional functionality when installed)
Optional Open Source/Free Software
The Infinit.e platform is designed to use Splunk 4.1 for monitoring and reporting of log files. Splunk is a completely optional part of the platform.
Hadoop can be used for batched custom analytics, but is not required.
Client requirements
The Infinit.e GUI (not required for headless configurations) can run on any Linux (Ubuntu, Redhat 6+), Windows XP+, or Mac (10.6+) capable of running the following software:
- Any browser capable of running Flash 11+ (eg Firefox 10+, IE 8+, Chrome 17+)
- Flash 11+
Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons