Edge Node Setup - aerospike-community/aerospike-hadoop GitHub Wiki
The procedure below explains setting up an edge node for clients to access the Hadoop Cluster for submitting jobs.
Let us set up a fourth instance as a client machine and submit jobs from the client machine to the hadoop cluster.
Follow these instructions for setting up a client machine in the cloud.
Create a minimum 4GB RAM, 2 core instance in the cloud and call it
ztg-client
.
Login as root.
sudo apt-get update
sudo apt-get install default-jdk
java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
root@ztg-client:~#sudo addgroup hadoop
Adding group `hadoop' (GID 1000) ...
Done.
root@ztg-client:~# sudo adduser --ingroup hadoop hdclient
Adding user `hdclient' ...
Adding new user `hdclient' (1000) with group `hadoop' ...
Creating home directory `/home/hdclient' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hdclient
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n]Y
Edit the following file:
root@ztg-master:~# sudo vi /etc/sysctl.conf
Add the following lines at the end:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Merely editing the file does not disable ipv6.
root@ztg-client:~# cat /proc/sys/net/ipv6/conf/all/disable_ipv6
0
Must executesysctl
to disable ipv6.
root@ztg-client:~# sudo sysctl -p
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Check:
root@ztg-client:~# cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
Note: This is slightly different from when we set up the Apache
Hadoop cluster. Now hdclient
on ztg-client
is getting access to
ztg-master
. Add hdclient
ssh keys - do this only on the ztg-client
node. (Do not do on ztg-master
or ztg-slave
nodes.)
root@ztg-client:~# su hdclient
hdclient@ztg-client:/root$ cd ~
hdclient@ztg-client:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdclient/.ssh/id_rsa):
Created directory '/home/hdclient/.ssh'.
Your identification has been saved in /home/hdclient/.ssh/id_rsa.
Your public key has been saved in /home/hdclient/.ssh/id_rsa.pub.
The key fingerprint is:
a2:a3:b0:5f:7b:75:4e:83:c1:a7:27:b7:76:df:e2:31 hdclient@ztg-client
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| . |
| o . |
| . S= |
| . .= * |
|. + . B o E |
| o o o. + . .+ |
|..o .. . . oo..|
+-----------------+
hdclient@ztg-client:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hdclient@ztg-client:~$ ssh hdclient@localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 49:fe:cf:5e:e1:cd:20:7b:96:88:94:62:bd:5a:5a:01.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-45-generic x86_64)
hdclient@ztg-client:~$ exit
logout
Connection to localhost closed.
Now, back as hdclient
:
$ chmod 600 authorized_keys
We will copy this key to ztg-master
to enable password less ssh
. But
before we can do this, we need to do the following.
-
Add ‘hdclient’ as a user on ztg-master. Login as
root
onztg-master
and:
root@ztg-master:~# sudo adduser --ingroup hadoop hdclient
-
Take the public IP address of
ztg-master
and add it to/etc/hosts
file as root onztg-client
.
root@ztg-client:~#sudo vi /etc/hosts
Insert after localhost entries, and, save and close:
x.y.z.156 ztg-master
- Now back as
hdclient
onztg-client
, do the following:
$ ssh-copy-id -i ~/.ssh/id_rsa.pub ztg-master
(will ask for hdclient
password on ztg-master
)
- As
hdclient
onztg-client
, make sure you can do a password less ssh using following command.
$ ssh ztg-master
We need hadoop to compile the jars on ztg-client
and submit the job
but we will not run any hadoop daemons.
Copy hadoop tarball from ztg-master
to ztg-client
. In order to use
scp
from ztg-master
, on ztg-master
, as root
, edit /etc/hosts
and add ztg-client
’s ip address.
root@ztg-master:~#sudo vi /etc/hosts
Insert after localhost entries and save and close:
x.y.z.209 ztg-client
Then, su
to hduser
on ztg-master
and do:
root@ztg-master:~#su hduser
hduser@ztg-master:~$scp /home/hduser/hadoop-2.6.0.tar.gz hdclient@ztg-client:/home/hdclient
(Supply hdclient
’s password on ztg-client
).
Now that we have the tarball on ztg-client
, proceed to install hadoop.
hdclient@ztg-client:~$ tar zxvf hadoop-2.6.0.tar.gz
hdclient@ztg-client:~$ mv hadoop-2.6.0 hadoop
hdclient@ztg-client:~$ ls
hadoop hadoop-2.6.0.tar.gz
Update environment variables for hdclient
in its .bashrc
hdclient@ztg-client:~$ vi ~/.bashrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/home/hdclient/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
#Dont need the exports below.
#export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
#export HADOOP_COMMON_HOME=$HADOOP_INSTALL
#export HADOOP_HDFS_HOME=$HADOOP_INSTALL
#export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
#HADOOP VARIABLES END
hdclient@ztg-client:~$ source ~/.bashrc
On ztg-client
, edit /home/hdclient/hadoop/etc/hadoop/core-site.xml
to be:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ztg-master:54310</value>
<description>Use HDFS as file storage engine</description>
</property>
</configuration>
On ztg-client
, edit /home/hdclient/hadoop/etc/hadoop/mapred-site.xml
to be:
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>ztg-master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
On ztg-master
, add new user hdclient
, ie. same username as on
ztg-client
that you want to give remote access to:
root@ztg-master:~#sudo adduser --ingroup hadoop hdclient
On ztg-master
, su
to hduser
and give write permission to our user
group on hadoop.tmp.dir
(here /home/hduser//tmp
). Open
core-site.xml
to get the path for hadoop.tmp.dir
.
hduser@ztg-master:~$chmod 777 /home/hduser/tmp
The steps below show how to submit a map-reduce job as user hdclient
logged into machine ztg-client
.
hdclient@ztg-client:~$ cd hadoop
hdclient@ztg-client:~/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 4
When successful, we expect the output to be something like this.
...
Job Finished in 1.511 seconds
Estimated value of Pi is 3.50000000000000000000
Now let us make the above command work.
- Create a directory structure in HDFS for the new user.
hduser@ztg-master:~$ hadoop fs -mkdir /user/hdclient/
hduser@ztg-master:~$ hadoop fs -ls /user
Found 2 items
drwxr-xr-x - hduser supergroup 0 2015-06-26 18:27 /user/hdclient
drwxr-xr-x - hduser supergroup 0 2015-06-26 17:26 /user/hduser
Note: If you do not create the above directory and submit a
map-reduce job as hdclient
from ztg-client
, you will get the
following error:
org.apache.hadoop.security.AccessControlException: Permission denied: user=hdclient, access=WRITE, inode="/user":hduser:supergroup:drwxr-xr-x
With just the above hdfs directory created, if you submit a map-reduce
job as hdclient
from ztg-client
, you will get the following error:
org.apache.hadoop.security.AccessControlException: Permission denied: user=hdclient, access=WRITE, inode="/user/hdclient":hduser:supergroup:drwxr-xr-x
So, need to do one more step:
hduser@ztg-master:~$ hadoop fs -chown -R hdclient:hadoop /user/hdclient
hduser@ztg-master:~$ hadoop fs -ls /user
Found 2 items
drwxr-xr-x - hdclient hadoop 0 2015-06-26 18:27 /user/hdclient
drwxr-xr-x - hduser supergroup 0 2015-06-26 17:26 /user/hduser
Now the map-reduce job should run from ztg-client
when logged in as
hdclient
.
In Hadoop 2.6.0, there is another hdfs directory:
hduser@ztg-master:~$ hadoop fs -ls /tmp/hadoop-yarn/staging/hduser
Found 1 items
drwx------ - hduser supergroup 0 2015-06-26 17:26 /tmp/hadoop-yarn/staging/hduser/.staging
We did not have to do anything with it so far and our map-reduce jobs launched from client machine work just fine.
Reference output when you start dfs and yarn.
hduser@ztg-master:~$ start-dfs.sh
Starting namenodes on [ztg-master]
ztg-master: starting namenode, logging to /home/hduser/hadoop/logs/hadoop-hduser-namenode-ztg-master.out
ztg-slave2: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-ztg-slave2.out
ztg-master: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-ztg-master.out
ztg-slave1: starting datanode, logging to /home/hduser/hadoop/logs/hadoop-hduser-datanode-ztg-slave1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hduser/hadoop/logs/hadoop-hduser-secondarynamenode-ztg-master.out
hduser@ztg-master:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-resourcemanager-ztg-master.out
ztg-slave2: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-ztg-slave2.out
ztg-slave1: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-ztg-slave1.out
ztg-master: starting nodemanager, logging to /home/hduser/hadoop/logs/yarn-hduser-nodemanager-ztg-master.out
hduser@ztg-master:~$
Open this link in your browser to monitor the cluster:
http://x.y.z.156:50070/dfshealth.html#tab-overview