Hortonworks Sandbox - kaushikdas/TechnicalWritings GitHub Wiki
Hadoop Installation on Virtual box
Step 1: Installing Virtual box
- Go to VirtualBox.org.
- Download the latest version of virtual box by clicking download button
(at the time of writing this page the version available is 6.1.4)
- After downloading, install the virtual box by clicking the downloaded exe file and selecting the default options.
Step 2: Downloading Hortonworks Sandbox
- Go to Cloudera Downloads
- Find the Hortonworks Sandbox link and click Download the Hortonworks Sandbox
- Click on Download Now button available under Hortonworks HDP
- Choose Virtualbox type of installation and click Let's Go! button
- Fill the form, accept terms and click Submit button
- Select and download 2.5.0 from Older Versions (because of its less memory requirements) to download 2.5.0 version of Hortonworks Sandbox environment (a
.ova
file)
Interacting with Hortonworks Sandbox
Launch the sandbox virtual machine using Oracle VM Virtualbox and then connect to the sandbox using an SSH terminal:
$ ssh [email protected] -p 2222 # connect using Git Bash in Win 10
Alternatively we can connect using (say) PuTTY. Password for the username maria_dev
is also maria_dev
.
Installing software in Hortonworks Sandbox
Software can be installed using yum
:
[maria_dev@sandbox ~]$ su
Password:
su: incorrect password
[maria_dev@sandbox ~]$ su
Password:
[root@sandbox maria_dev]# yum install vim
Loaded plugins: fastestmirror, ovl, priorities
Setting up Install Process
Loading mirror speeds from cached hostfile
* base: mirrors.piconets.webwerks.in
* epel: mirrors.piconets.webwerks.in
* extras: mirrors.piconets.webwerks.in
* updates: mirrors.piconets.webwerks.in
Resolving Dependencies
--> Running transaction check
…
Total download size: 7.8 M
Installed size: 23 M
Is this ok [y/N]: y
Downloading Packages:
…
Complete!
[root@sandbox maria_dev]# exit
exit
[maria_dev@sandbox ~]$
If below 403 Forbidden
error comes with yum install
:
[root@sandbox maria_dev]# yum install vim
Loaded plugins: fastestmirror, ovl, priorities
Setting up Install Process
Determining fastest mirrors
epel/metalink | 6.7 kB 00:00
* base: mirrors.piconets.webwerks.in
* epel: mirrors.piconets.webwerks.in
* extras: mirrors.piconets.webwerks.in
* updates: mirrors.piconets.webwerks.in
AMBARI.2.4.0.0-2.x | 2.9 kB 00:00
AMBARI.2.4.0.0-2.x/primary_db | 8.3 kB 00:00
HDP-2.5 | 2.9 kB 00:00
HDP-2.5/primary_db | 69 kB 00:00
HDP-UTILS-1.1.0.21 | 2.9 kB 00:00
HDP-UTILS-1.1.0.21/primary_db | 36 kB 00:00
base | 3.7 kB 00:00
base/primary_db | 4.7 MB 00:00
epel | 5.3 kB 00:00
epel/primary_db | 6.1 MB 00:00
extras | 3.4 kB 00:00
extras/primary_db | 29 kB 00:00
mysql-connectors-community | 2.5 kB 00:00
mysql-connectors-community/primary_db | 46 kB 00:00
mysql-tools-community | 2.5 kB 00:00
mysql-tools-community/primary_db | 55 kB 00:00
mysql56-community | 2.5 kB 00:00
mysql56-community/primary_db | 286 kB 00:00
puppetlabs-deps | 2.5 kB 00:00
puppetlabs-deps/primary_db | 12 kB 00:00
puppetlabs-products | 2.5 kB 00:00
puppetlabs-products/primary_db | 85 kB 00:00
http://dev2.hortonworks.com.s3.amazonaws.com/repo/dev/master/utils/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 403 Forbidden"
Trying other mirror.
To address this issue please refer to the below knowledge base article
https://access.redhat.com/solutions/69319
If above article doesn't help to resolve this issue please open a ticket with Red Hat Support.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: sandbox. Please verify its path and try again
[root@sandbox maria_dev]#
please try below solution (https://community.cloudera.com/t5/Support-Questions/sandbx-repo-problem/td-p/196580):
# cat > /etc/yum.repos.d/sandbox.repo
[sandbox]
baseurl=http://dev2.hortonworks.com.s3.amazonaws.com/repo/dev/master/utils/
name=Sandbox repository (tutorials)
gpgcheck=0
enabled=0
[Ctrl-D]
#
After setting the enabled=0
in the /etc/yum.repos.d/sandbox.repo
file perform a yum clean
and then try the installation again:
# yum clean all
# yum install vim
Copying files to HDFS
- Using terminal
- Download file from Internet directly to the sandbox:
[maria_dev@sandbox ~]$ wget http://media.sundog-soft.com/hadoop/ml-100k/u.data [maria_dev@sandbox ~]$ u.data
- Copy the file from local machine downloaded from the Internet to sandbox using
scp
:$ scp -P 2222 ~/Downloads/ml-100k.zip [email protected]:/home/maria_dev/. [email protected]'s password: ml-100k.zip [maria_dev@sandbox ~]$ ls ml-100k.zip u.data [maria_dev@sandbox ~]$ unzip ml-100k.zip Archive: ml-100k.zip creating: ml-100k/ ...
To download
ml-100k.zip
- Go to
https://grouplens.org/
- Go to datasets
- Scroll down and download
ml-100k.zip
from older datasets > MovieLens 100K Dataset
- Go to
- Copy the file to HDFS:
# The command to operate on hadoop file system: hadoop fs -xxx [maria_dev@sandbox ~]$ hadoop fs -ls # at this stage it is empty [maria_dev@sandbox ~]$ hadoop fs -mkdir ml-100k # this will create ml-100k dir [maria_dev@sandbox ~]$ hadoop fs -ls Found 1 items drwxr-xr-x - maria_dev hdfs 0 2020-02-21 17:10 ml-100k [maria_dev@sandbox ~]$ hadoop fs -copyFromLocal u.data ml-100k/u.data # copyFromLocal [maria_dev@sandbox ~]$ hadoop fs -ls ml-100k/u.data -rw-r--r-- 1 maria_dev hdfs 2079229 2020-02-21 17:10 ml-100k/u.data
- Download file from Internet directly to the sandbox:
Resetting Ambari admin password
- User command
ambari-admin-password-reset
ad root user
[maria_dev@sandbox ~]$ su
Password:
[root@sandbox maria_dev]# ambari-admin-password-reset
Please set the password for admin:
Please retype the password for admin:
The admin password has been set.
Restarting ambari-server to make the password change effective...
...
Waiting for server start....................
Ambari Server 'start' completed successfully.
[root@sandbox maria_dev]# exit
exit
[maria_dev@sandbox ~]$