hadoopPseudoDistributedOperation - juedaiyuer/researchNote GitHub Wiki
#hadoop伪分布运行#
##配置##
Use the following:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
##执行##
Format the filesystem:
$ hdfs namenode -format
跳出的一堆运行参数,先不用管它,以后会逐渐深入;注意不要出现Error相关的提示就好
Start NameNode daemon and DataNode daemon(运行)
$ sbin/start-dfs.sh
命令窗口出现starting namenode和starting datanode
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:
http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
$ hdfs dfs -mkdir /usr
$ hdfs dfs -mkdir /usr/juedaiyuer
Copy the input files into the distributed filesystem:
$ hdfs dfs -put etc/hadoop /usr/juedaiyuer
Run some of the examples provided:
$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /usr/juedaiyuer output 'dfs[a-z.]+'
Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ hdfs dfs -get output output
$ cat output/*
or View the output files on the distributed filesystem:
$ hdfs dfs -cat output/*
或者是列出output文件目录里有哪些文件
$ hdfs dfs -ls output/*
-rw-r--r-- 1 juedaiyuer supergroup 0 2017-01-01 21:57 output/part-r-00000
-rw-r--r-- 1 juedaiyuer supergroup 0 2017-01-01 21:57 output/_SUCCESS
When you’re done, stop the daemons with:
$ sbin/stop-dfs.sh
##source##