Environment Setup - Sotera/track-communities GitHub Wiki
The .box files we have appear to be missing one or two things which are now nearly impossible to install because the images are so old the package managers don't work anymore. For example, even after you fix apt-get
so you can install pip
, it gives you v.1.0, when pip
is now on v.21.1. Somewhere back around v.9 PyPI stopped accepting non-SSL connections, so you cannot pip install
from this VM without a full upgrade.
We'll try to rebuild on a modern VM / Docker image.
Otherwise, here's notes on how it used to work, mixed with attempts to fix it.
- Install Vagrant: http://www.vagrantup.com
- TODO: Remove this dependency. Just use VirtualBox or Docker
- Install Virtual Box: https://www.virtualbox.org/wiki/Downloads
- Download XDATA VM
- (xdata-0.2.1.box): http://sotera.github.io/xdata-vm/
- Note: 4GB VM file - see if we can slim that down.
- Note: Use IE / Firefox. Chrome may fail at the end of the download.
- Tested 2021-07-16.
See the XDATA VM wiki for baseline software if installing or setting up on your own machine.
This is three steps:
-
Go to the folder with the data-[version].box file. Let's assume you are using 0.2.1. (Change as needed.)
-
Add the XDATA VM box definition to Vagrant:
% vagrant box add xdata-vm-0.2.1 ./xdata-0.2.1.box
- Initialize a new VM based on the XDATA VM box configuration.
% vagrant init xdata-vm-0.2.1
(base) ~/vm/xdata-vm-0.2.1 % vagrant box add xdata-vm-0.2.1 xdata-0.2.1.box
==> box: Box file was not detected as metadata. Adding it directly...
==> box: Adding box 'xdata-vm-0.2.1' (v0) for provider:
box: Unpacking necessary files from: file:///Users/.../vm/xdata-vm-0.2.1/xdata-0.2.1.box
==> box: Successfully added box 'xdata-vm-0.2.1' (v0) for 'virtualbox'!
(base) ~/vm/xdata-vm-0.2.1 % vagrant init xdata-vm-0.2.1
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.
(base) ~/vm/xdata-vm-0.2.1 % vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'xdata-vm-0.2.1'...
Progress: 90% ⬅︎⬅︎⬅︎ This step can take a minute or so ⬅︎⬅︎⬅︎
==> default: Matching MAC address for NAT networking...
==> default: Setting the name of the VM: xdata-vm-021_default_1626462962175_24688
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default:
default: Vagrant insecure key detected. Vagrant will automatically replace
default: this with a newly generated keypair for better security.
default:
default: Inserting generated public key within guest...
default: Removing insecure key from the guest if it's present...
default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
default: The guest additions on this VM do not match the installed version of
default: VirtualBox! In most cases this is fine, but in rare cases it can
default: prevent things such as shared folders from working properly. If you see
default: shared folder errors, please make sure the guest additions within the
default: virtual machine match the version of VirtualBox you have installed on
default: your host and reload your VM.
default:
default: Guest Additions Version: 4.1.12
default: VirtualBox Version: 6.1
==> default: Mounting shared folders...
default: /vagrant => /Users/.../vm/xdata-vm-0.2.1
Okay, you should now have a running VM!
- The VirtualBox VM will be in a new folder like
../xdata-021_default_162649866.../
- Vagrant will create a matching
.vbox
(not .box) file:xdata-021_default_162649866....vbox
. - (The long string of digits is a random unique ID.)
- ssh into the box
vagrant@xdata:~$ vagrant ssh
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.8.0-39-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Thu Jul 22 12:58:17 UTC 2021
System load: 0.71 Processes: 91
Usage of /: 19.4% of 39.34GB Users logged in: 0
Memory usage: 16% IP address for eth0: 10.0.2.15
Swap usage: 0% IP address for docker0: 172.17.42.1
Graph this data and manage this system at:
https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
Last login: Thu Jun 19 23:15:21 2014 from 10.0.2.2
vagrant@xdata:~$ whoami
vagrant
- Check that this machine has network access:
vagrant@xdata:~$ ping google.com
PING google.com (172.217.13.238) 56(84) bytes of data.
64 bytes from iad23s61-in-f14.1e100.net (172.217.13.238): icmp_req=1 ttl=63 time=48.5 ms
64 bytes from iad23s61-in-f14.1e100.net (172.217.13.238): icmp_req=2 ttl=63 time=88.3 ms
If that doesn't work, stop. There is something wrong with your VirtualBox setup. If on a Mac, check your System Settings ➛ Security & Privacy: you may have to allow VirtualBox to control your machine, and on newer OS X you will have to reboot (augh) because Apple disallowed hot kernel changes like that.
TODO: Replace this by just sending a newer .box!
As of 0.2, the Sotera components are (mostly) included. (For older versions, see Old-Install-Components.) However:
- It seems to lack the Python interface for Impala.
- You can't
pip install python-impala
, becausepip
seems to be missing. - You can't
apt-get install pip
, because theapt
config is years out of date.
So:
- Fix the apt-get package manager. The config files are now out of date.
$ sudo sed -i -e \
's/archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' \
/etc/apt/sources.list
$ ls /etc/apt/sources.list.d
cloudera-impala.list cloudera.list docker.list java.list r.list
$ sudo sed -i -e 's/impala1/impala1.4.0/g' /etc/apt/sources.list.d/cloudera-impala.list
$ sudo sed -i -e 's/cdh4/cdh4.7.0/g' /etc/apt/sources.list.d/cloudera.list
$ sudo mv /etc/apt/sources.list.d/docker.list /etc/apt/sources.list.d/__docker.list__
$ sudo apt-get update
- Get pip and install Python interface to Impala. Note you need to force a
pip
upgrade because PyPi won't accept non-SSL connections anymore. And that will require a release upgrade. Sigh
$ sudo do-release-upgrade
<restart>
$ sudo apt-get install python-pip --upgrade
$ pip install impyla==0.7
Start your virtual machine.
$ vagrant up
SSH into the VM as bigdata/bigdata, then edit the following configuration file to add additional properties. These configuration changes should allow you to protect your single VM machine from memory and node processing issues that may crop up in later steps.
$ sudo vi /etc/hadoop/conf/mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>3</value>
</property>
<property>
<name>mapred.tasktracker.reducer.tasks.maximum</name>
<value>3</value>
</property>
Stop your virtual machine.
$ vagrant halt
Start your virtual machine.
$ vagrant up
SSH into the VM as bigdata/bigdata, then test the following commands to ensure system is appropriately configured:
$ hadoop fs -ls /
$ hive -e "show tables"
$ python
> import impala
> client = impala.ImpalaBeeswaxClient('localhost:21000')
> client.connect()
> print client.execute("show tables")
Stop your virtual machine.
$ vagrant halt