Hadoop Administration - salmanbaig8/imp GitHub Wiki

Adding and removing nodes from a cluster: Can be performed from Ambari web console > Need IP address or hostname of node to add > Node must be reachable /etc/hosts on both master and child nodes should be updated prior to adding child nodes

How to verify the health of a cluster: DFS disk check by running DFS report hdfs dfsadmin -report

How to start and stop a cluster's components: stopping some can save some resources

Modifying Hadoop configuration parameters: COnfigured using xml files: hadoop-env.sh : Environment variables that are used in the scripts to run Hadoop core-site.xml : confg settings for Hadoop Core, such as I/O settings that are common to HDFS and mapreduce hdfs-site.xml: confg settings for HDFS daemons: name node, secondary name node and the data nodes mapred-site.xml: confg settings for Mapreduce daemons and jobtracker and tasktrackers masters: A list of machine(one per line) that each run secondary NameNode Slaves : a list of machine one per line that each run data node and tasktracker hadoop-metrics.properties: properties for controlling how metrics are published in Hadoop log4j.properties: properties for system logfiles, the NameNode auditlog, and the task log for the tasktracker child process

BigInsights Config DIR: /usr/iop/current/hadoop-client/conf

Setting up a rack topology: Can be defined by script which specifies which node is on which rack script is referenced in topology.script.file.name property in core-site.xml topology.script.file.name /opt/ibm/biginsights/hadoop-conf/rack-aware.sh

Hadoop Administration - salmanbaig8/imp GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️