Edge Node Setup - aerospike-community/aerospike-hadoop GitHub Wiki
The procedure below explains setting up an edge node for clients to access the Hadoop Cluster for submitting jobs.
Let us set up a fourth instance as a client machine and submit jobs from the client machine to the hadoop cluster.
Follow these instructions for setting up a client machine in the cloud.
Create a minimum 4GB RAM, 2 core instance in the cloud and call it
ztg-client.
Login as root.
sudo apt-get update
sudo apt-get install default-jdk
java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)root@ztg-client:~#sudo addgroup hadoop
Adding group `hadoop' (GID 1000) ...
Done.root@ztg-client:~# sudo adduser --ingroup hadoop hdclient
Adding user `hdclient' ...
Adding new user `hdclient' (1000) with group `hadoop' ...
Creating home directory `/home/hdclient' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hdclient
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n]YEdit the following file:
root@ztg-master:~# sudo vi /etc/sysctl.conf
Add the following lines at the end:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1Merely editing the file does not disable ipv6.
root@ztg-client:~# cat /proc/sys/net/ipv6/conf/all/disable_ipv6
0Must executesysctl to disable ipv6.
root@ztg-client:~# sudo sysctl -p
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1Check:
root@ztg-client:~# cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1Note: This is slightly different from when we set up the Apache
Hadoop cluster. Now hdclient on ztg-client is getting access to
ztg-master. Add hdclient ssh keys - do this only on the ztg-client
node. (Do not do on ztg-master or ztg-slave nodes.)
root@ztg-client:~# su hdclient
hdclient@ztg-client:/root$ cd ~
hdclient@ztg-client:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdclient/.ssh/id_rsa):
Created directory '/home/hdclient/.ssh'.
Your identification has been saved in /home/hdclient/.ssh/id_rsa.
Your public key has been saved in /home/hdclient/.ssh/id_rsa.pub.
The key fingerprint is:
a2:a3:b0:5f:7b:75:4e:83:c1:a7:27:b7:76:df:e2:31 hdclient@ztg-client
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| . |
| o . |
| . S= |
| . .= * |
|. + . B o E |
| o o o. + . .+ |
|..o .. . . oo..|
+-----------------+
hdclient@ztg-client:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hdclient@ztg-client:~$ ssh hdclient@localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 49:fe:cf:5e:e1:cd:20:7b:96:88:94:62:bd:5a:5a:01.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-45-generic x86_64)
hdclient@ztg-client:~$ exit
logout
Connection to localhost closed.Now, back as hdclient:
$ chmod 600 authorized_keysWe will copy this key to ztg-master to enable password less ssh. But
before we can do this, we need to do the following.
-
Add ‘hdclient’ as a user on ztg-master. Login as
rootonztg-masterand:
root@ztg-master:~# sudo adduser --ingroup hadoop hdclient -
Take the public IP address of
ztg-masterand add it to/etc/hostsfile as root onztg-client.
root@ztg-client:~#sudo vi /etc/hosts
Insert after localhost entries, and, save and close:
x.y.z.156 ztg-master
- Now back as
hdclientonztg-client, do the following:
$ ssh-copy-id -i ~/.ssh/id_rsa.pub ztg-master (will ask for hdclient
password on ztg-master)
- As
hdclientonztg-client, make sure you can do a password less ssh using following command.
$ ssh ztg-master
We need hadoop to compile the jars on ztg-client and submit the job
but we will not run any hadoop daemons.
Copy hadoop tarball from ztg-master to ztg-client. In order to use
scp from ztg-master, on ztg-master, as root, edit /etc/hosts
and add ztg-client’s ip address.
root@ztg-master:~#sudo vi /etc/hostsInsert after localhost entries and save and close:
x.y.z.209 ztg-clientThen, su to hduser on ztg-master and do:
root@ztg-master:~#su hduser
hduser@ztg-master:~$scp /home/hduser/hadoop-2.6.0.tar.gz hdclient@ztg-client:/home/hdclient(Supply hdclient’s password on ztg-client).
Now that we have the tarball on ztg-client, proceed to install hadoop.
hdclient@ztg-client:~$ tar zxvf hadoop-2.6.0.tar.gz
hdclient@ztg-client:~$ mv hadoop-2.6.0 hadoop
hdclient@ztg-client:~$ ls
hadoop hadoop-2.6.0.tar.gzUpdate environment variables for hdclient in its .bashrc
hdclient@ztg-client:~$ vi ~/.bashrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/home/hdclient/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
#Dont need the exports below.
#export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
#export HADOOP_COMMON_HOME=$HADOOP_INSTALL
#export HADOOP_HDFS_HOME=$HADOOP_INSTALL
#export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
#HADOOP VARIABLES END
hdclient@ztg-client:~$ source ~/.bashrcOn ztg-client, edit /home/hdclient/hadoop/etc/hadoop/core-site.xml
to be:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ztg-master:54310</value>
<description>Use HDFS as file storage engine</description>
</property>
</configuration>On ztg-client, edit /home/hdclient/hadoop/etc/hadoop/mapred-site.xml
to be:
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>ztg-master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>On ztg-master, add new user hdclient, ie. same username as on
ztg-client that you want to give remote access to:
root@ztg-master:~#sudo adduser --ingroup hadoop hdclientOn ztg-master, su to hduser and give write permission to our user
group on hadoop.tmp.dir (here /home/hduser//tmp). Open
core-site.xml to get the path for hadoop.tmp.dir.
hduser@ztg-master:~$chmod 777 /home/hduser/tmpThe steps below show how to submit a map-reduce job as user hdclient
logged into machine ztg-client.
hdclient@ztg-client:~$ cd hadoop
hdclient@ztg-client:~/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 4When successful, we expect the output to be something like this.
...
Job Finished in 1.511 seconds
Estimated value of Pi is 3.50000000000000000000Now let us make the above command work.
- Create a directory structure in HDFS for the new user.
hduser@ztg-master:~$ hadoop fs -mkdir /user/hdclient/
hduser@ztg-master:~$ hadoop fs -ls /user
Found 2 items
drwxr-xr-x - hduser supergroup 0 2015-06-26 18:27 /user/hdclient
drwxr-xr-x - hduser supergroup 0 2015-06-26 17:26 /user/hduserNote: If you do not create the above directory and submit a
map-reduce job as hdclient from ztg-client, you will get the
following error:
org.apache.hadoop.security.AccessControlException: Permission denied: user=hdclient, access=WRITE, inode="/user":hduser:supergroup:drwxr-xr-xWith just the above hdfs directory created, if you submit a map-reduce
job as hdclient from ztg-client, you will get the following error:
org.apache.hadoop.security.AccessControlException: Permission denied: user=hdclient, access=WRITE, inode="/user/hdclient":hduser:supergroup:drwxr-xr-xSo, need to do one more step:
hduser@ztg-master:~$ hadoop fs -chown -R hdclient:hadoop /user/hdclient
hduser@ztg-master:~$ hadoop fs -ls /user
Found 2 items
drwxr-xr-x - hdclient hadoop 0 2015-06-26 18:27 /user/hdclient
drwxr-xr-x - hduser supergroup 0 2015-06-26 17:26 /user/hduserNow the map-reduce job should run from ztg-client when logged in as
hdclient.
In Hadoop 2.6.0, there is another hdfs directory:
hduser@ztg-master:~$ hadoop fs -ls /tmp/hadoop-yarn/staging/hduser
Found 1 items
drwx------ - hduser supergroup 0 2015-06-26 17:26 /tmp/hadoop-yarn/staging/hduser/.stagingWe did not have to do anything with it so far and our map-reduce jobs launched from client machine work just fine.
Reference output when you start dfs and yarn.
hduser@ztg-master:~$ start-dfs.sh
Starting namenodes on [ztg-master]
ztg-master: starting namenode, logging to /home/hduser/hadoop/logs/hadoop-hduser-namenode-ztg-master.out
ztg-slave2: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-ztg-slave2.out
ztg-master: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-ztg-master.out
ztg-slave1: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-ztg-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hduser/hadoop/logs/hadoop-hduser-secondarynamenode-ztg-master.out
hduser@ztg-master:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-resourcemanager-ztg-master.out
ztg-slave2: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-ztg-slave2.out
ztg-slave1: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-ztg-slave1.out
ztg-master: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-ztg-master.out
hduser@ztg-master:~$Open this link in your browser to monitor the cluster:
http://x.y.z.156:50070/dfshealth.html#tab-overview