Hadoop - xuan103/class-2020-07 GitHub Wiki
Modules
-
The project includes these modules:
-
Hadoop Common: The common utilities that support the other Hadoop modules. (管理命令
-
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.(分散行檔案
-
Hadoop YARN: A framework for job scheduling and cluster resource management.(分散運算系統 [跑程式 執行程式
-
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.(資料庫引擎
-
Hadoop Ozone: An object store for Hadoop.(分散檔案系統(臭氧層
-
-
Hadoop 必要條件:
-
cluster(多台電腦)叢集
-
作業系統:linux (穩定(ubuntu centos
-
使用 java 語言 jde jre
-
-
Jre 是 java runtime environment:java 程序的運行環境
-
Jdk 是 java development kit:java 的開發工具包
陸軍:dtc (cluster) 實體電腦
海軍:dtk (install k8s.k3s) 實體電腦
空軍:dkc (docker compose) 虛擬電腦(雲端電腦)container
Sysinternals
118.163.47.126
$ nano ~/.bashrc
export PATH=/home/bigred/wk/cnt/dtc/bin:/home/bigred/wk/dtc/bin:$PATH
$ nano sysinfo
gw=$(route -n | grep -e "^0.0.0.0 ")
export GWIF=${gw##* }
ips=$(ifconfig $GWIF | grep 'inet ')
export IP=$(echo $ips | cut -d' ' -f2)
export NETID=${IP%.*}
export GW=$(route -n | grep -e '^0.0.0.0' | tr -s \ - | cut -d ' ' -f2)
echo "[`hostname`]"
echo "--------------------------------------------------------"
m=$(free -mh | grep Mem:)
echo -n "Memory : "
echo $m | cut -d' ' -f2
cn=$(cat /proc/cpuinfo | grep 'model name' | head -n 1 | cut -d ':' -f2 | tr -s ' ')
echo -n "CPU : $cn (core: "
cn=$(cat /proc/cpuinfo | grep 'model name' | wc -l)
echo "$cn)"
echo "IP Address : $IP"
echo "Default Gateway : $GW"
echo ""
java -version &> /tmp/java
cat /tmp/java | head -n 1
echo ""
echo "/etc/hosts"
cat /etc/hosts | grep -E "^[0-9]{3}"
$ nano dt
#!/bin/bash
echo -e "CDT 20.08\n"
[ "$#" != 1 ] && echo 'dt [sysinfo | sysprep | build | list | restart]' && exit 1
c="sysinfo sysprep build list restart"
[ ! "${c}" =~ "$1" ](/xuan103/class-2020-07/wiki/-!-"${c}"-=~-"$1"-) && echo "Oops, wrong command" && exit 1
i=$(cat /etc/hosts | grep -E "mas|wka" | tr '\t' ' ' | cut -d' ' -f1)
c=$1
#echo $i
ps aux | grep -v grep | grep -o 'busybox httpd' &>/dev/null
[ "$?" != "0" ] && busybox httpd -p 8888 -h ~/wk/cdt/bin/cluster
case $c in
sysinfo)
for x in $i
do
nc -w 2 -z $x 22
[ "$?" != "0" ] && continue
ssh $x 'wget -qO - http://192.168.66.253:8888/sysinfo | bash'
echo ""
done
;;
sysprep)
for x in $i
do
nc -w 2 -z $x 22 &>/dev/null
[ "$?" != "0" ] && continue
ssh $x 'wget -qO - http://192.168.66.253:8888/sysprep | bash'
echo ""
done
;;
build)
for x in $i
do
nc -w 2 -z $x 22
[ "$?" != "0" ] && continue
ssh $x 'wget -qO - http://192.168.66.253:8888/hdp330 | bash'
ssh $x 'wget -qO - http://192.168.66.253:8888/spk300 | bash'
ssh $x 'wget -qO - http://192.168.66.253:8888/dt.bash | sudo tee /opt/bin/dt.bash &>/dev/null'
[ "$?" == "0" ] && echo "dt.bash copied"
ssh $x 'wget -qO - http://192.168.66.253:8888/environment | sudo tee /home/kuan/.ssh/environment &>/dev/null'
[ "$?" == "0" ] && echo "environment copied"
for y in core-site.xml hadoop-env.sh hdfs-site.xml mapred-site.xml yarn-site.xml
do
u="wget -qO - http://192.168.66.253:8888/$y | sudo tee /opt/hadoop-3.3.0/etc/hadoop/$y &>/dev/null"
ssh $x $u
[ "$?" == "0" ] && echo "$y copied"
done
echo ""
done
cp ~/wk/cdt/bin/cluster/*.xml ~/wk/cdt/mnt/hdp330
cp ~/wk/cdt/bin/cluster/*.sh ~/wk/cdt/mnt/hdp330
;;
list)
;;
restart)
read -p "Are you sure ? (YES/NO) " ans
[ $ans != "YES" ] && exit 1
for x in $i
do
nc -w 2 -z $x 22 &>/dev/null
[ "$?" != "0" ] && continue
ssh $x 'sudo reboot'
done
;;
esac
bigred@gw:~/wk/cnt/dtc/bin$ dt sysinfo
CDT 20.08
[mas01]
Memory : 3.3G CPU : Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz (core: 4) IP Address : 192.168.40.10 Default Gateway : 192.168.40.254
openjdk version "1.8.0_265"
/etc/hosts 127.0.0.1 localhost 192.168.40.254 gw 192.168.40.10 mas01 192.168.40.20 wka01 192.168.40.21 wka02 192.168.40.22 wka03 192.168.40.23 wka04 192.168.40.30 ds01
[wka01]
Memory : 3.3G CPU : Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz (core: 4) IP Address : 192.168.40.20 Default Gateway : 192.168.40.254
openjdk version "1.8.0_265"
/etc/hosts 127.0.0.1 localhost 192.168.40.254 gw 192.168.40.10 mas01 192.168.40.20 wka01 192.168.40.21 wka02 192.168.40.22 wka03 192.168.40.23 wka04 192.168.40.30 ds01
[wka02]
Memory : 3.3G CPU : Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz (core: 4) IP Address : 192.168.40.21 Default Gateway : 192.168.40.254
openjdk version "1.8.0_265"
/etc/hosts 127.0.0.1 localhost 192.168.40.254 gw 192.168.40.10 mas01 192.168.40.20 wka01 192.168.40.21 wka02 192.168.40.22 wka03 192.168.40.23 wka04 192.168.40.30 ds01
[wka03]
Memory : 3.3G CPU : Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz (core: 4) IP Address : 192.168.40.22 Default Gateway : 192.168.40.254
openjdk version "1.8.0_265"
/etc/hosts 127.0.0.1 localhost 192.168.40.254 gw 192.168.40.10 mas01 192.168.40.20 wka01 192.168.40.21 wka02 192.168.40.22 wka03 192.168.40.23 wka04 192.168.40.30 ds01
[wka04]
Memory : 3.3G CPU : Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz (core: 4) IP Address : 192.168.40.23 Default Gateway : 192.168.40.254
openjdk version "1.8.0_265"
/etc/hosts 127.0.0.1 localhost 192.168.40.254 gw 192.168.40.10 mas01 192.168.40.20 wka01 192.168.40.21 wka02 192.168.40.22 wka03 192.168.40.23 wka04 192.168.40.30 ds01
nano formathdfs.sh
#!/bin/bash
read -p "Are you sure ? (YES/NO) " ans
[ $ans != "YES" ] && echo "abort format HDFS" && exit 1
ssh master rm -r nn/* &>/dev/null
ssh master rm -r sn/* &>/dev/null
for n in wka01 wka02 wka03 wka04 wka05
do
nc -w 1 -z $n 22 &>/dev/null
[ "$?" == "0" ] && ssh $n rm -r dn/* &>/dev/null
echo "$n clean"
done
ssh master 'hdfs namenode -format -clusterID cute' &>/dev/null
[ "$?" != "0" ] && echo "formathdfs failure" && exit 1
echo "formathdfs ok"
nano starthdfs.sh
#!/bin/bash
ssh master hadoop-daemon.sh start namenode &>/dev/null
sleep 10; nc -w 5 -z master 8020 &>/dev/null
[ "$?" != 0 ] && echo "pls formathdfs first" && exit 1
echo "master: Name Node Started"
ssh master hadoop-daemon.sh start secondarynamenode &>/dev/null
[ "$?" == "0" ] && echo "master: Secondary Name Node started"
for n in wka01 wka02 wka03 wka04 wka05
do
nc -w 5 -z $n 22 &>/dev/null
if [ "$?" == "0" ]; then
ssh $n hadoop-daemon.sh start datanode &>/dev/null
[ "$?" == "0" ] && echo "$n: Data Node started"
fi
done
-
HDFS 的好處:
-
處理超大文件。
-
指令退休鍵,使升級不再麻煩。
-
運行於廉價的商用機器集群上。
-
可以無限制的擴展硬碟容量。
-
擁有分散式的讀取方式。
-
當 client 端寫入達 16M 以上,Name Node 會給予三個位址寫入 (複製幾份可由 hdfs-site.xml 裡面設定),每 128M 切成一個 Block。
-
當 client 端讀取。
-
-