hadoopSingleNodeCluster - juedaiyuer/researchNote GitHub Wiki

#hadoop单节点运行#

edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

# set to the root of your Java installation
export JAVA_HOME=/yourfile

$ bin/hadoop

This will display the usage documentation for the hadoop script.

##官网示例##

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp etc/hadoop/*.xml input
# 正则表达式,寻找dfs开头的,后面跟1个或1个以上的字母
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ cat output/*

##创建本地的示例数据文件##

本地文件测试

在hadoop文件下创建一个文件夹mytest,可以是任意位置,根据自身情况而定;用来存储本地原始数据

mytest.txt

this is my hadoop test,and spark

运行之前需要查看out文件是否存在,存在删除即可,譬如下面的输出文件hdfsOutput是否已经存在

$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount mytest/mytest.txt hdfsOutput/

Hadoop命令会启动一个JVM来运行这个MapReduce程序,并自动获得Hadoop的配置,同时把类的路径(及其依赖关系)加入到Hadoop的库中

查看HDFS上hdfsOutput目录内容

$ hadoop fs -ls hdfsOutput

Found 2 items
-rw-r--r--   1 juedaiyuer juedaiyuer          0 2017-01-01 10:30 hdfsOutput/_SUCCESS
-rw-r--r--   1 juedaiyuer juedaiyuer         45 2017-01-01 10:30 hdfsOutput/part-r-00000

使用下面指令查看结果输出文件内容

$ hadoop fs -cat hdfsOutput/part-r-00000 

hadoop	1
is	1
my	1
spark	1
test,and	1
this	1

##在HDFS上创建文件目录##

bin/hadoop fs -mkdir hdfsInput

将磁盘上文件放在HDFS的文件目录下,也就是本地文件

$ sudo bin/hadoop fs -put mytest/mytest.txt  /hdfsInput

查看文件是否正确传入到HDFS的文件目录下

$ bin/hadoop fs -ls hdfsInput

Found 1 items
-rw-rw-r--   1 juedaiyuer juedaiyuer         33 2016-12-28 22:03 hdfsInput/mytest.txt 

$ bin/hadoop fs -cat hdfsInput/*

this is my hadoop test,and spark

同样,运行之前需要查看out文件是否存在,存在删除即可,譬如下面的输出文件hdfsOutput是否已经存在

运行例子

$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount hdfsInput hdfsOutput

查看HDFS上hdfsOutput目录内容

$ hadoop fs -ls hdfsOutput
Found 2 items
-rw-r--r--   1 juedaiyuer juedaiyuer          0 2017-01-01 11:16 hdfsOutput/_SUCCESS
-rw-r--r--   1 juedaiyuer juedaiyuer         45 2017-01-01 11:16 hdfsOutput/part-r-00000

使用下面指令查看结果输出文件内容

$ hadoop fs -cat hdfsOutput/part-r-00000

##执行jar的路径问题##

比如上面的命令

$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount hdfsInput hdfsOutput

执行jar的时候需要的是绝对路径,需要设置classpath

##source##