Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Hadoop configuration is taken from a number of places. This page is intended to provide a quick guide.

  • The default configuration is taken from the hadoop-core JAR, ie cannot be changed
  • When a job is created from the custom API/GUI, the following parameters are overridden:
    • pre November 2014: 
      • Where to look for the jobtracker and FS ("mapred.job.tracker", "mapred.job.tracker", "fs.default.name")
        • These are taken from the "*-site.xml" files found in "hadoop.configpath" (in the "/hadoop" subdir)
      • all the per-job parameters (Infinit.e configuration, MongoDB configuration, mapper classes etc)
    • November 2014 onwards:
      • All settings from the "*-site.xml" files found in "hadoop.configpath" (in the "/hadoop" subdir) override the defaults
  • Many of these configuration parameters are overridden by the settings maintained in the Cloudera Manager
    • It isn't currently clear which, it should probably be assumed that if a configuration parameter controls the environment in which the job runs, rather than the job itself, then it will be overridden by Cloudera Manager
    • (Note that the Cloudera Manager configuration for a given service on each node lives in a subdirectory of "/var/run/cloudera-scm-agent/process/" - which sub-directory can be found by getting the process id ("ps -ef"), then the working directory of that process ("pwdx $HADOOP_PID") ... the current configurations can also be viewed from the Cloudera Manager GUI)
  • Note as a result of the above, it is not necessary to redistribute the client configuration when an "environmental" setting is changed.

 

  • No labels