Slurm on Virtual Machines - ciemat-tic/codec GitHub Wiki
1.- install CentOS
2.-configure VirtualBox to provide fixed IPs.
We need 2 networks on each VM. one is for internet, one is for fixed IP.
VirtualBox congfiguration: 2.1- Create host-only network. Default values are OK.
2.2. Set the VM to use that network too
3.- Configure fixed IPs in virtual machine
ifconfig -> see that fixed network name is "enp0s8"
# more /etc/sysconfig/network-scripts/ifcfg-enp0s8
TYPE=Ethernet
BOOTPROTO="static"
NAME=enp0s8
DEVICE=enp0s8
ONBOOT=yes
IPADDR=192.168.56.110 <---------- this is the IP we want
NETMASK=255.255.255.0
NM_CONTROLLED=no
4.- configure VM 4.1. hostname
#more /etc/hostname
slurm_master
4.2 NFS
Required library
yum install nfs-utils
note that we are exporting /shared
4.2.1 master export stuff
more /etc/exports
/shared 192.168.56.0/24(rw,sync,no_root_squash,no_subtree_check)
start service
systemctl restart nfs-server
enable it on boot time
systemctl enable rpcbind
systemctl enable nfs-server
systemctl enable nfs-lock
systemctl enable nfs-idmap
systemctl start rpcbind
systemctl start nfs-server
systemctl start nfs-lock
systemctl start nfs-idmap
4.2.2 Slave
configuration
more /etc/systemd/system/shared.mount
[Unit]
Description=nfs mount script
Requires=network-online.service
After=network-online.service
Before=slurmd.service
[Mount]
What=192.168.56.110:/shared
# Where we want mount this share
Where=/shared
Options=
Type=nfs
[Install]
WantedBy=multi-user.target
# Important:
# this file must be renamed to <mountpoint>.mount where <mountpoint>, is the FULL path
# where the share will be mounted but slashes "/" MUST BE REPLACED with dashes "-" with .mount
# as extension.
# This means, if we want mount to "/storage/movies2" (see above "Where=/storage/movies2")
# then this file must be renamed to 'storage-movies2.mount' and can be enabled via ssh with the
# command 'servicectl enable storage-movies2.mount'
modify slurmd.service so this starts first
#more /etc/systemd/system/slurmd.service
(...)
After=home.mount,shared.mount
enable it on boot time
systemctl enable shared.mount
systemctl enable rpcbind
systemctl enable nfs-lock
systemctl enable nfs-idmap
5.- Install DMTCP following this instructions : https://github.com/ciemat-tic/codec/wiki/Slurm-DMTCP
important flag: --with-prefix=/shared/dmtcp
6.- Install Slurm following this instructions https://github.com/ciemat-tic/codec/wiki/Slurm-cluster
Some things:
Downloaded with: git clone https://github.com/supermanue/slurm.git
Libraries to avoid warnings on configure:
yum install -y numactl-devel libcurl-devel readline-devel man2html lua-devel rrdtool-devel freeipmi-devel pmix-devel hwloc-devel lz4-devel json-c-devel pam-devel
bug in MariaDB (Oct.2017) obliges to do this hack:
[root@slurm_master lib64]# pwd
/usr/lib64
[root@slurm_master lib64]# ls *maria*
libmariadbclient.a
[root@slurm_master lib64]# pkg-config --libs mariadb
-lmariadb -lpthread -lz -ldl -lm -lssl -lcrypto
###AS YOU CAN SEE, "lmariadb" IS MISSING. WE MANUALLY CREATE A LINK
[root@slurm_master lib64]# sudo ln -s /usr/lib64/libmariadbclient.a /usr/lib64/libmariadb.a
I had some issues on making Slurmdbd work because it wouldn't create the required DB users. I did it manually with
>mysql
MariaDB [(none)]> create user slurm identified by password "";
MariaDB [(none)]> use mysql;
MariaDB [(none)]> update user set password="" where User='slurm';
MariaDB [(none)]> flush privileges;
DMTCP branch: git checkout DMTCP
important flags on configure: --with-dmtcp=/shared/dmtcp --prefix=/shared/slurm
7.- Clone slave:
Things to change:
- /etc/hostname
- /etc/sysconfig/network-scripts/ifcfg-enp0s8