2 - infinit.e-hadoop-installer.offline or .online

Introduction

The infinit.e-hadoop-installer RPM is used to install Hadoop, Hue, and optionally the Clouder Manager software on Infinit.e API nodes. It depends on the platform pre-requisites being present.

Note that there is one limitation to the offline RedHat/CentOS 6 support: the Hadoop installer is not supported. This is not a massive issue since it was purely a wrapper for Cloudera's online installer. Currently, for offline RedHat/Centos 6 installs, you must download and deploy the Cloudera or HortonWorks install packages yourself.

The Cloudera install will get very upset if the hostnames are in anyway complicated - eg multiple instances of the same root hostname with different FQDNs, the same hostname allocated to different IP addresses etc. The ideal seems to be a single domain with DNS and then just put localhost in the /etc/hosts files. Otherwise this web page (for CDH4 but the advice also holds for CDH3) explains how to set up and verify the hostnames. One thing that guide doesn't say but does appear to be the case if DNS isn't set up is that the short version of the hostname should always be present in /etc/hosts after the FQDN one.

1. Installation of the RPM

To install from the Ikanow repo, simply run:

yum install infinit.e-hadoop-installer.online

For offline installs:

Once infinit.e-hadoop-installer RPM has been copied to the target machine execute the following command (assuming that you are within the same directory in which the RPM is located):

yum localinstall infinit.e-hadoop-installer.online-*.rpm --nogpgcheck

or:

yum --disablerepo=* localinstall infinit.e-hadoop-installer.offline-*.rpm --nogpgcheck

depending on the type of install.

For single node clusters, or where the scalability of a single job across a cluster is not important (eg each individual job is small), steps 2+ are not necessary - instead the parameter "hadoop.local_mode" can be set to true in the configuration parameters, and no further configuration is necessary.

2. Running install.sh

RPM will unpack all of the files required to complete the installation into the following directory:

/mnt/opt/hadoop-infinite

infinit.e-hadoop-installer copies the online_install.sh or offline_install.sh script from following directory on the target machine:

/mnt/opt/hadoop-infinite/scripts

to :

/mnt/opt/hadoop-infinite/install.sh

If you are not already in you root directory type the following command within terminal:

cd /mnt/opt/hadoop-infinite
Hadoop, Hue, and Cloudera Manager

At least one node in the cluster should have the Cloudera Manager software installed. To install the Cloudera Manager software use the following command:

sh install.sh full

The install will sometimes fail (it appears sometimes the Cloudera install starts before all the dependencies are loaded). If so just rerun the install script (with "full" again).

Note that the Cloudera install process can be picky with the name of the distribution contained in "/etc/redhat-release". The following strings are supported:

  • CentOS release 5.[0-9] <any string>
  • CentOS Linux release 6 <any string>

It may be necessary to manually change the "redhat-release" file to make the Cloudera install work. Note that the 5/6 number is used to determine which RPMs to install, so must be correct (the rest does not matter provided Cloudera accepts it).

Towards the end of the installation process the install.sh script will launch a Cloudera provided application that installs the Cloudera Manager software. This application requires the user to interact with it to accept the software licenses presented. Follow the instructions displayed by the application to complete the installation and configuration of the management software via the web based administration tool.

Note that if you install the Cloudera Manager after installing the rest of the system, and you have placed JDBC files in "JAVA_HOME/lib/ext", you should note that JAVA_HOME may have (!) now moved and the files may need to be placed in the new location. See under Installing JARs for more details.

Cloudera can be very sensitive about hostnames in non-ec2 installations. Using capital letters or special characters can cause nameNode startup failure during cloud services startup. It is recommended that you use only lowercase letters and numbers for hostnames and domains in your /etc/hosts file.

For example: 192.168.1.1 myhost1.mydomain

Hadoop and Hue Applications Only

By default install.sh will only install Hadoop and Hue applications using the following commands.

sh install.sh

All nodes Infinit.e API nodes should have Hadoop and Hue installed on them.

3. Creating and configuring the cluster

The steps required to create and configure the Hadoop cluster are described here.

4. Adding or Removing nodes in the cluster

The steps required to add or remove nodes is described here.

5. (Uninstalling Hadoop, if needed)

This web page provides some details: https://ccp.cloudera.com/display/express37/Uninstalling+Cloudera+Manager+Free+Edition

Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons