Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

User interface phase - installation

Login using admin/admin

Select In the first page select "Cloudera Express" and "Continue" twice (until you get to the "Specify hosts for your CDH cluster installation")Image Removed

Add the hostnames for the nodes you want to add to the cluster, "Search" and "Continue" (assuming the right hostnames appeared)

...

  • "Cloudera recomments settings /proc/sys/vm/swappiness to 0"
  • "There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components"
    • (this just refers to Java)
  • "Cloudera supports versions 1.6.0_31 and 1.7.0_55 of Oracle's JVM and later. OpenJDK is not supported, and gcj is known to not work. Check the component version table below to identify hosts with unsupported versions of Java."

...

Warning

In my experience if you have to go "Back" at any point, or have to refresh the browser at any stage, then the install as a whole should be considered compromised. If this occurs then run "/opt/hadoop-infinite/scripts/uninstall_cd5.sh" on all nodes and then restart the install from the beginning.

User interface phase -

...

setup

Select Custom Services in the "Cluster Setup" page (last option):

Image Added

Select the following services and "Continue":

  • HDFS
  • MapReduce
  • ZooKeeper

The next page lets you control role assignments:

Image Added

It is recommended to assign the "master" roles (NameNode, SecondaryNameNode, Balancer, HttpFS, JobTracker, all the "Cloudera Management Service" roles) to DB nodes (which have more flexible memory handling), and to balance them out across the available DB nodes as much as possible, to minimize the load on any one machine. (By default all the "master" roles are placed on the same server).

This page also lets you decide which nodes to run TaskTracker and DataNode roles (TaskTracker is needed to run a Map/Reduce job, and DataNode is for the HDFS distributed file system) - eg DB nodes only or API and DB nodes. We recommend installing on both API and DB nodes - if the API nodes prove to be overloaded, or you are not using Hadoop for heavy duty batch processing, you can always just stop the services on those nodes after installation.

For example in the above screenshot, it would be better to specify "ip-10-60-18-179.ec2.internal" as the SecondaryNameNode, the HttpFS, and 3 of the Management Services. This will balance the processing across the 2 nodes, as shown by this screenshot:

Image Added

Once you have balanced the role assignments, press "Continue".

On the next page, use the "Embedded Database" (the default), "Test Connection", and then "Continue" once that is complete:
 

Image Added

User interface phase - configuration

The next set of pages configure the various services and roles.

TODO

 

 

 

TODO

Installing CDH5 (YARN)

...