Test Cluster and Run MapReduce - Vishwajeetsinh98/random-forest-using-hadoop-mapreduce GitHub Wiki
Test Cluster and Run MapReduce
In this page, we will try to run the cluster and put files using HDFS and run MapReduce using the Yarn provider.
1. Format HDFS and first run
@ on ALL nodes
hadoop namenode -format
Start HDFS server
You should get an output like:
ubuntu@master:~$ start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-namenode-master.out
slave1: starting datanode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-datanode-slave1.out
slave2: starting datanode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-datanode-slave2.out
Starting secondary namenodes [secondarymaster]
secondarymaster: starting secondarynamenode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-secondarynamenode-secondarymaster.out
Start YARN server
You should again get an output like:
ubuntu@master:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-resourcemanager-master.out
slave1: starting nodemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-nodemanager-slave1.out
slave2: starting nodemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-nodemanager-slave2.out
2. Testing HDFS by putting files
#create directories
hadoop fs -mkdir /test_directory
#add a file to the HDFS (random)
hadoop fs -put /etc/hosts /test_directory/
#read files
hadoop fs -cat /test_directory/hosts
3. Running a sample from Hadoop to check MapReduce working
yarn jar /home/ubuntu/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 1000000
If you get an output similar to this, your cluster is setup and ready to run codes:
16/06/02 14:36:26 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /networkgeekstuff_output
16/06/02 14:36:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/
16/06/02 14:36:33 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/06/02 14:36:34 INFO input.FileInputFormat: Total input paths to process : 1
16/06/02 14:36:34 INFO mapreduce.JobSubmitter: number of splits:1
16/06/02 14:36:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464859507107_0002
16/06/02 14:36:34 INFO impl.YarnClientImpl: Submitted application application_1464859507107_0002
16/06/02 14:36:34 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1464859507107_0002/
16/06/02 14:36:34 INFO mapreduce.Job: Running job: job_1464859507107_0002
16/06/02 14:36:43 INFO mapreduce.Job: Job job_1464859507107_0002 running in uber mode : false
16/06/02 14:36:43 INFO mapreduce.Job: map 0% reduce 0%
16/06/02 14:36:52 INFO mapreduce.Job: map 100% reduce 0%
16/06/02 14:37:02 INFO mapreduce.Job: map 100% reduce 100%
16/06/02 14:37:03 INFO mapreduce.Job: Job job_1464859507107_0002 completed successfully
16/06/02 14:37:03 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=10349090
FILE: Number of bytes written=20928299
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5458326
HDFS: Number of bytes written=717768
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6794
Total time spent by all reduces in occupied slots (ms)=6654
Total time spent by all map tasks (ms)=6794
Total time spent by all reduce tasks (ms)=6654
Total vcore-seconds taken by all map tasks=6794
Total vcore-seconds taken by all reduce tasks=6654
Total megabyte-seconds taken by all map tasks=6957056
Total megabyte-seconds taken by all reduce tasks=6813696
Map-Reduce Framework
Map input records=124456
Map output records=901325
Map output bytes=8546434
Map output materialized bytes=10349090
Input split bytes=127
Combine input records=0
Combine output records=0
Reduce input groups=67505
Reduce shuffle bytes=10349090
Reduce input records=901325
Reduce output records=67505
Spilled Records=1802650
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=220
CPU time spent (ms)=5600
Physical memory (bytes) snapshot=327389184
Virtual memory (bytes) snapshot=1329098752
Total committed heap usage (bytes)=146931712
Shuffle Errors
File Input Format Counters
Bytes Read=5458199
File Output Format Counters
Bytes Written=717768