Hadoop - xuan103/class-2020-07 GitHub Wiki

Modules

The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules. （管理命令
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.（分散行檔案
- Hadoop YARN: A framework for job scheduling and cluster resource management.（分散運算系統 [跑程式執行程式
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.（資料庫引擎
- Hadoop Ozone: An object store for Hadoop.（分散檔案系統（臭氧層
Hadoop 必要條件：
- cluster（多台電腦）叢集
- 作業系統：linux (穩定（ubuntu centos
- 使用 java 語言 jde jre
Jre 是 java runtime environment:java 程序的運行環境
Jdk 是 java development kit:java 的開發工具包

陸軍：dtc (cluster) 實體電腦

海軍：dtk (install k8s.k3s) 實體電腦

空軍：dkc (docker compose) 虛擬電腦（雲端電腦）container

Sysinternals

118.163.47.126

$ nano ~/.bashrc

export PATH=/home/bigred/wk/cnt/dtc/bin:/home/bigred/wk/dtc/bin:$PATH

$ nano sysinfo

gw=$(route -n | grep -e "^0.0.0.0 ")
export GWIF=${gw##* }
ips=$(ifconfig $GWIF | grep 'inet ')
export IP=$(echo $ips | cut -d' ' -f2)
export NETID=${IP%.*}
export GW=$(route -n | grep -e '^0.0.0.0' | tr -s \ - | cut -d ' ' -f2)

echo "[`hostname`]"
echo "--------------------------------------------------------"

m=$(free -mh | grep Mem:)
echo -n "Memory : "
echo $m | cut -d' ' -f2

cn=$(cat /proc/cpuinfo | grep 'model name' | head -n 1 | cut -d ':' -f2 | tr -s ' ')
echo -n "CPU : $cn (core: "
cn=$(cat /proc/cpuinfo | grep 'model name' | wc -l)
echo "$cn)"

echo "IP Address : $IP"
echo "Default Gateway : $GW"
echo ""

java -version &> /tmp/java 
cat /tmp/java | head -n 1
echo ""

echo "/etc/hosts"
cat /etc/hosts | grep -E "^[0-9]{3}"

$ nano dt

#!/bin/bash
echo -e "CDT 20.08\n"

[ "$#" != 1 ] && echo 'dt [sysinfo | sysprep | build | list | restart]' && exit 1
c="sysinfo sysprep build list restart"
[ ! "${c}" =~ "$1" ](/xuan103/class-2020-07/wiki/-!-"${c}"-=~-"$1"-) && echo "Oops, wrong command" && exit 1

i=$(cat /etc/hosts | grep -E "mas|wka" | tr '\t' ' ' | cut -d' ' -f1)
c=$1
#echo $i

ps aux | grep -v grep | grep -o 'busybox httpd' &>/dev/null
[ "$?" != "0" ] && busybox httpd -p 8888 -h ~/wk/cdt/bin/cluster

case $c in
sysinfo)
    for x in $i
    do
      nc -w 2 -z $x 22
      [ "$?" != "0" ] && continue
      ssh $x 'wget -qO - http://192.168.66.253:8888/sysinfo | bash'
      echo ""
    done
    ;;
sysprep)
    for x in $i
    do
      nc -w 2 -z $x 22 &>/dev/null
      [ "$?" != "0" ] && continue
      ssh $x 'wget -qO - http://192.168.66.253:8888/sysprep | bash'
      echo ""
    done
    ;;
build)
    for x in $i
    do
      nc -w 2 -z $x 22
      [ "$?" != "0" ] && continue
      ssh $x 'wget -qO - http://192.168.66.253:8888/hdp330 | bash'
      ssh $x 'wget -qO - http://192.168.66.253:8888/spk300 | bash'
      ssh $x 'wget -qO - http://192.168.66.253:8888/dt.bash | sudo tee /opt/bin/dt.bash &>/dev/null'
      [ "$?" == "0" ] && echo "dt.bash copied"
      ssh $x 'wget -qO - http://192.168.66.253:8888/environment | sudo tee /home/kuan/.ssh/environment &>/dev/null'
      [ "$?" == "0" ] && echo "environment copied"

      for y in core-site.xml hadoop-env.sh hdfs-site.xml mapred-site.xml yarn-site.xml
      do
        u="wget -qO - http://192.168.66.253:8888/$y | sudo tee /opt/hadoop-3.3.0/etc/hadoop/$y &>/dev/null"
        ssh $x $u
        [ "$?" == "0" ] && echo "$y copied"
      done
      echo ""
    done
    cp ~/wk/cdt/bin/cluster/*.xml ~/wk/cdt/mnt/hdp330
    cp ~/wk/cdt/bin/cluster/*.sh ~/wk/cdt/mnt/hdp330
    ;;
list)
    ;;
restart)
    read -p "Are you sure ? (YES/NO) " ans
    [ $ans != "YES" ] && exit 1
    for x in $i
    do
      nc -w 2 -z $x 22 &>/dev/null
      [ "$?" != "0" ] && continue
      ssh $x 'sudo reboot' 
    done
    ;;
esac

bigred@gw:~/wk/cnt/dtc/bin$ dt sysinfo

CDT 20.08

[mas01]

Memory : 3.3G CPU : Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz (core: 4) IP Address : 192.168.40.10 Default Gateway : 192.168.40.254

openjdk version "1.8.0_265"