Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

The following document describes the minimum recommended system requirements for installing Infinit.e in a production environment.

Info

When provisioning a cluster to be used with a NoSQL system like Infinit.e, it is important to try to balance disk IO with CPU speed with memory bandwidth. A "super computer" with 10s of cores and 100GB of memory is no use if it only has one 2x RAID array.

In our (empirical/anecdotal) investigations to date, we have found the sweet spot to be 4-8 cores per 16-64GB of memory per 4x RAID (eg 8x RAID-10 or 4x RAID-0).

(Note: we have not yet investigated how to use SSD to improve imperformance - this is on our near term roadmap)

Server Operating System

Infinit.e is currently tested to run on the following operating systems:

...

Info

We refer to "document" as a catch all for database record, Web page, PDF/office document, XML document etc. The figures below are for some "average" document across all those types (say 5KB in size) ... if most documents ingested are smaller (eg DB records) then the capacity/performance will be higher and conversely if most documents are larger (eg complex pdf reports) then the capacity/performance will be lower.

In practice as you ingest data you should track disk usage against document size to get a more accurate picture of your own data (or just add lots more disk space than could possibly be needed and then monitor performance to decide when to scale).

Separately there is a different set of volumetrics associated with log records. The documents and record sizing combine linearly. The "per hardware" scaling factors are described below.

Demo configuration

For running in a VM on a laptop to demonstrate the tool. May become slow for more than 100-1000 documents, or a few hundred thousand records.

 Infinit.e API + DB Node
Processor 1x 1.8+ GHz CPU
Memory1 or 2 GB RAM (swap required to get up to ~8GB total)
NetworkWAN connection/none
Storage

20GB 

Compact configuration

A small deployment servicing a few thousand documents, or about 10 million records:

The following table lists the minimum recommended hardware configuration for one Infinit.e API and Database node.

 Infinit.e API + DB Node
Processor 1 X Dual/Quad Core 1.8+ GHz CPUs   
Memory4-8 GB RAM (swap required to get up to ~8GB total)
Network1x GigE LAN connection
Storage

10 GB Root/OS partition +
50 GB data partition  

...

The following configuration works quite acceptably on 500K-1M documents, or about 50 million records. The higher the spec, the faster the performance for a given number/size of documents. However this topology does not provide redundancy.

 Infinit.e API NodeInfinit.e Database Node
Processor 1-2 X Dual Core 1.8+ GHz CPUs    1-2 X Dual Core 1.8+ GHz CPUs 
Memory8-16 GB RAM (or more)8-16 GB RAM (or more)
Network2x GigE LAN connection2x GigE LAN connection
Storage

15 GB Root/OS partition +
20 50 GB data partition, RAID-0

(~5GB ~10GB per 1 million "average" documents)  

15 GB Root/OS partition +
50 100 GB data partition, RAID-0

(~10GB ~60GB per 1 million "average" documents)

Operational configuration

A 2x API node and 2x DB node deployment using the following hardware works very quickly on a 2M+ document deployment (eg 2M-5M is a good typical range), or about 100 million records. In general the system capacity scales fairly linearly with nodes (see below).

...

 Infinit.e API NodeInfinit.e Database Node
Processor 2 X Dual Core 1.8+ GHz CPUs    2 X Dual Core 1.8+ GHz CPUs 
Memory16 GB RAM or more (32GB is ideal)16 GB RAM or more (32GB is ideal)
Network2x GigE LAN connection2x GigE LAN connection
Storage

20 GB Root/OS partition +
50100+ GB data partition, RAID-0
(~5GB ~10GB per 1 million "average" documents)  

20 GB Root/OS partition +
100600+ GB data partition, RAID-0
(~10GB ~60GB per 1 million "average" documents)
Info

Note the API and DB scales per 2-node block, since the primary benefit of the second node is redundancy rather than performance - although it balances the reads somewhat (not the writes) so there is some (not 2x) performance gain within a replica set.

API nodes scale for records without requiring additional DB nodes.

So for example, each pair of 16GB API nodes provides capacity for approximately 100M records (~3M records/day with 30 retention).

And also, for example, Each 4-node combination of 2x 16GB API and 2x 16GB DB nodes provides for approximately 2M documents.

Required Open Source Software

The following open source software packages are an integral part of the Infinit.e platform:

  • Java JRE/JDK 6u30+ (current version = 6u31)
  • Apache Tomcat 6.X (current version = 6.0.35)
  • MongoDB 2.1+
  • elasticsearch 0.19+
  • (Hadoop CDH 5.3+ is not required but provide additional functionality when installed)
  • (Logstash 1.4+ is not required but provide additional functionality when installed)
Note: These packages can be installed as part of Infinit.e's installation packages or be preinstalled on a server.

...