Test Cluster and Run MapReduce - Vishwajeetsinh98/random-forest-using-hadoop-mapreduce GitHub Wiki

Test Cluster and Run MapReduce

In this page, we will try to run the cluster and put files using HDFS and run MapReduce using the Yarn provider.

1. Format HDFS and first run

@ on ALL nodes

hadoop namenode -format

Start HDFS server

@ on MASTER

start-dfs.sh

You should get an output like:

ubuntu@master:~$ start-dfs.sh 
    Starting namenodes on [master]
    master: starting namenode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-namenode-master.out
    slave1: starting datanode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-datanode-slave1.out
    slave2: starting datanode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-datanode-slave2.out
    Starting secondary namenodes [secondarymaster]
    secondarymaster: starting secondarynamenode, logging to /home/ubuntu/hadoop/logs/hadoop-ubuntu-secondarynamenode-secondarymaster.out

Start YARN server

@ on MASTER

start-yarn.sh

You should again get an output like:

 ubuntu@master:~$ start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-resourcemanager-master.out
    slave1: starting nodemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-nodemanager-slave1.out
    slave2: starting nodemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-nodemanager-slave2.out

2. Testing HDFS by putting files

@on MASTER

#create directories
hadoop fs -mkdir /test_directory

#add a file to the HDFS (random)
hadoop fs -put /etc/hosts /test_directory/

#read files
hadoop fs -cat /test_directory/hosts

3. Running a sample from Hadoop to check MapReduce working

@on MASTER

yarn jar /home/ubuntu/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 1000000

If you get an output similar to this, your cluster is setup and ready to run codes:

16/06/02 14:36:26 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /networkgeekstuff_output
16/06/02 14:36:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/172.31.27.101:8032
16/06/02 14:36:33 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/06/02 14:36:34 INFO input.FileInputFormat: Total input paths to process : 1
16/06/02 14:36:34 INFO mapreduce.JobSubmitter: number of splits:1
16/06/02 14:36:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464859507107_0002
16/06/02 14:36:34 INFO impl.YarnClientImpl: Submitted application application_1464859507107_0002
16/06/02 14:36:34 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1464859507107_0002/
16/06/02 14:36:34 INFO mapreduce.Job: Running job: job_1464859507107_0002
16/06/02 14:36:43 INFO mapreduce.Job: Job job_1464859507107_0002 running in uber mode : false
16/06/02 14:36:43 INFO mapreduce.Job:  map 0% reduce 0%
16/06/02 14:36:52 INFO mapreduce.Job:  map 100% reduce 0%
16/06/02 14:37:02 INFO mapreduce.Job:  map 100% reduce 100%
16/06/02 14:37:03 INFO mapreduce.Job: Job job_1464859507107_0002 completed successfully
16/06/02 14:37:03 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=10349090
                FILE: Number of bytes written=20928299
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=5458326
                HDFS: Number of bytes written=717768
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6794
                Total time spent by all reduces in occupied slots (ms)=6654
                Total time spent by all map tasks (ms)=6794
                Total time spent by all reduce tasks (ms)=6654
                Total vcore-seconds taken by all map tasks=6794
                Total vcore-seconds taken by all reduce tasks=6654
                Total megabyte-seconds taken by all map tasks=6957056
                Total megabyte-seconds taken by all reduce tasks=6813696
        Map-Reduce Framework
                Map input records=124456
                Map output records=901325
                Map output bytes=8546434
                Map output materialized bytes=10349090
                Input split bytes=127
                Combine input records=0
                Combine output records=0
                Reduce input groups=67505
                Reduce shuffle bytes=10349090
                Reduce input records=901325
                Reduce output records=67505
                Spilled Records=1802650
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=220
                CPU time spent (ms)=5600
                Physical memory (bytes) snapshot=327389184
                Virtual memory (bytes) snapshot=1329098752
                Total committed heap usage (bytes)=146931712
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=5458199
        File Output Format Counters 
                Bytes Written=717768