Hadoop configuration files

Hadoop configuration is taken from a number of places. This page is intended to provide a quick guide.

  • The default configuration is taken from the hadoop-core JAR, ie cannot be changed
  • When a job is created from the custom API/GUI, the following parameters are overridden:
    • pre November 2014: 
      • Where to look for the jobtracker and FS ("mapred.job.tracker", "mapred.job.tracker", "fs.default.name")
        • These are taken from the "*-site.xml" files found in "hadoop.configpath" (in the "/hadoop" subdir)
      • all the per-job parameters (Infinit.e configuration, MongoDB configuration, mapper classes etc)
    • November 2014 onwards:
      • All settings from the "*-site.xml" files found in "hadoop.configpath" (in the "/hadoop" subdir) override the defaults
  • Many of these configuration parameters are overridden by the settings maintained in the Cloudera Manager
    • It isn't currently clear which, it should probably be assumed that if a configuration parameter controls the environment in which the job runs, rather than the job itself, then it will be overridden by Cloudera Manager
    • (Note that the Cloudera Manager configuration for a given service on each node lives in a subdirectory of "/var/run/cloudera-scm-agent/process/" - which sub-directory can be found by getting the process id ("ps -ef"), then the working directory of that process ("pwdx $HADOOP_PID") ... the current configurations can also be viewed from the Cloudera Manager GUI)
  • Note as a result of the above, it is not necessary to redistribute the client configuration when an "environmental" setting is changed.

The configuration files in "/usr/lib/hadoop/conf" are not used at all by Cloudera or Infinit.e, however they are used by the command line (for example "hadoop fs -ls"). Before using any command-line utils, it is therefore recommended to copy the client configuration into "/usr/lib/hadoop/conf"

 

 

 

Copyright © 2012 IKANOW, All Rights Reserved | Licensed under Creative Commons