Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Add the hostnames for the nodes you want to add to the cluster, "Search" and "Continue" (assuming the right hostnames appeared)

Warning

For Amazon installs on nodes that use Amazon's built-in DNS but are configured with a Route 53 hostname, it is not currently possible to install Cloudera. The following options are possible:

  • Set the hostname back to its' internal one, eg "ip-<ip-address-with-dashes-not-dots>.ec2.internal"
  • Set up a local named/bind server (and update /etc/resolv.conf etc) to "proxy" reverse lookups
    • (this is the preferred option)

 

Image Removed

On the "Cluster Installation" page, accept the defaults and "Continue". 

Image Removed

Info
titleInstalling in a VPC

When installing in a VPC, both the parcels repo and cloudera-manager repos are local. They need to be configured as follows.

The first step is setting your parcels repository. Remove all the remote parcels below and add ours. To get to this screen click the "More Options" button next to "Use Parcels".

Image Removed

The other thing that needs to change is the cloudera-manager repository.

Image Removed

GPG Key is not configured as of writing this.

 

 

 

On the page after that, select the "Install Oracle Java SE Development Kit" option and "Continue".

On the page after that, ignore the "Single User Mode" option and "Continue".

The next page requires you to select the SSH login credentials. For most systems, this involves uploading an ssh key. Don't forget to set the key passphrase if one is specified. Only 1 simultaneous installation need be specified.

Info
titleVPC and EL6

In the VPCs which run on Centos6, you would use root here. ec2-user is a amazon linux thing.

 

Image Removed

"Continue" on, which will start the installation. Once that is done "Continue" again, to move to another automatic installation page ("Installing Selected Parcels"). "Continue" once that is doneImage Added

On the "Cluster Installation" page, accept the defaults and "Continue". 

Image Added

Info
titleInstalling in a VPC

When installing in a VPC, both the parcels repo and cloudera-manager repos are local. They need to be configured as follows.

The first step is setting your parcels repository. Remove all the remote parcels below and add ours. To get to this screen click the "More Options" button next to "Use Parcels".

Image Added

The other thing that needs to change is the cloudera-manager repository.

Image Added

GPG Key is not configured as of writing this.

On the page after that, select the "Install Oracle Java SE Development Kit" option and "Continue".

On the page after that, ignore the "Single User Mode" option and "Continue".

The next page requires you to select the SSH login credentials. For most systems, this involves uploading an ssh key. Don't forget to set the key passphrase if one is specified. Only 1 simultaneous installation need be specified.

Info
titleVPC and EL6

In the VPCs which run on Centos6, you would use root here. ec2-user is a amazon linux thing.

 

Image Added

"Continue" on, which will start the installation. Once that is done "Continue" again, to move to another automatic installation page ("Installing Selected Parcels"). "Continue" once that is done.

Warning

For Amazon installs on nodes that use Amazon's built-in DNS but are configured with a Route 53 (or similar) hostname, Cloudera reports the installs as failing on the heartbeat - just ignore this and carry on.

When you get to the role assignment page, it will only let you assign to the "Cloudera Manager" node. At that point in a new tab:

  • navigate to "http://<MANAGER_SERVER:PORT>/cmf/hardware/hosts"
  • Select "Add New Hosts to Cluster", select the managed hosts, select all the nodes, and "default" through back to the hosts page
  • Then refresh the role assignment page (http://<MANAGER_SERVER:PORT>/cmf/clusters/1/express-add-services/index#step=roleAssignmentsStep) and carry on.

The next page is the "Host inspector" - this page will provide warnings and errors. The following warnings can be ignored:

  • "Cloudera recomments settings /proc/sys/vm/swappiness to 0"
  • "There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components"
    • (this just refers to Java)
  • "Cloudera supports versions 1.6.0_31 and 1.7.0_55 of Oracle's JVM and later. OpenJDK is not supported, and gcj is known to not work. Check the component version table below to identify hosts with unsupported versions of Java."

...

Using the "Search" bar to find them, the following configuration settings should be modified

  • Change "Number of Tasks to Run per JVM" to -1
  • Set "MapReduce Service Environment Advanced Configuration Snippet (Safety Valve)" to 
    • JAVA_HOME="/usr/java/default/jre/"
  • Find "MapReduce Child Java Opts Base" and append  "-Djava.security.policy=/opt/infinite-home/config/security.policy" after (the already present) "-Djava.net.preferIPv4Stack=true" (with a space between them)
  • Search for "Simultaneous" and set (eg) "Maximum Number of Simultaneous Map Tasks" to 2 and "Maximum Number of Simultaneous Reduce Tasks" to 1
    • (on larger instances than the typical 15GB instances, for heavy batch analytics use, this can be increased)

Then select the "Save Changes" button. This brings up two "Stale Configuration" notifications in the top left:

...