hadoopPseudoDistributedOperation - juedaiyuer/researchNote GitHub Wiki

#hadoop伪分布运行#

##配置##

Use the following:

etc/hadoop/core-site.xml:

<configuration>
	<property>
	    <name>fs.defaultFS</name>
	    <value>hdfs://localhost:9000</value>
	</property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
	<property>
	    <name>dfs.replication</name>
	    <value>1</value>
	</property>
</configuration>

##执行##

Format the filesystem:

$ hdfs namenode -format

跳出的一堆运行参数,先不用管它,以后会逐渐深入;注意不要出现Error相关的提示就好

Start NameNode daemon and DataNode daemon(运行)

$ sbin/start-dfs.sh 

命令窗口出现starting namenode和starting datanode

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

Browse the web interface for the NameNode; by default it is available at:

http://localhost:50070/

Make the HDFS directories required to execute MapReduce jobs:

$ hdfs dfs -mkdir /usr
$ hdfs dfs -mkdir /usr/juedaiyuer

Copy the input files into the distributed filesystem:

$ hdfs dfs -put etc/hadoop /usr/juedaiyuer

Run some of the examples provided:

$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /usr/juedaiyuer output 'dfs[a-z.]+'

Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

$ hdfs dfs -get output output
$ cat output/*

or View the output files on the distributed filesystem:

$ hdfs dfs -cat output/*

或者是列出output文件目录里有哪些文件

$ hdfs dfs -ls output/*
-rw-r--r--   1 juedaiyuer supergroup          0 2017-01-01 21:57 output/part-r-00000
-rw-r--r--   1 juedaiyuer supergroup          0 2017-01-01 21:57 output/_SUCCESS

When you’re done, stop the daemons with:

$ sbin/stop-dfs.sh

##source##

⚠️ **GitHub.com Fallback** ⚠️