Slurm on Virtual Machines - ciemat-tic/codec GitHub Wiki

1.- install CentOS

2.-configure VirtualBox to provide fixed IPs.

We need 2 networks on each VM. one is for internet, one is for fixed IP.

VirtualBox congfiguration: 2.1- Create host-only network. Default values are OK.

2.2. Set the VM to use that network too

3.- Configure fixed IPs in virtual machine

ifconfig -> see that fixed network name is "enp0s8"

# more /etc/sysconfig/network-scripts/ifcfg-enp0s8 
TYPE=Ethernet
BOOTPROTO="static"
NAME=enp0s8
DEVICE=enp0s8
ONBOOT=yes
IPADDR=192.168.56.110  <---------- this is the IP we want
NETMASK=255.255.255.0
NM_CONTROLLED=no

4.- configure VM 4.1. hostname

#more /etc/hostname
slurm_master

4.2 NFS

Required library

yum install nfs-utils

note that we are exporting /shared

4.2.1 master export stuff

more /etc/exports
/shared 192.168.56.0/24(rw,sync,no_root_squash,no_subtree_check)

start service

systemctl restart nfs-server

enable it on boot time

systemctl enable rpcbind
systemctl enable nfs-server
systemctl enable nfs-lock
systemctl enable nfs-idmap
systemctl start rpcbind
systemctl start nfs-server
systemctl start nfs-lock
systemctl start nfs-idmap

4.2.2 Slave

configuration

more /etc/systemd/system/shared.mount
[Unit]
Description=nfs mount script
Requires=network-online.service
After=network-online.service
Before=slurmd.service

[Mount]
What=192.168.56.110:/shared
# Where we want mount this share
Where=/shared
Options=
Type=nfs

[Install]
WantedBy=multi-user.target

# Important:
# this file must be renamed to <mountpoint>.mount where <mountpoint>, is the FULL path
# where the share will be mounted but slashes "/" MUST BE REPLACED with dashes "-" with .mount
# as extension.
# This means, if we want mount to "/storage/movies2" (see above "Where=/storage/movies2")
# then this file must be renamed to 'storage-movies2.mount' and can be enabled via ssh with the
# command 'servicectl enable storage-movies2.mount'

modify slurmd.service so this starts first

#more /etc/systemd/system/slurmd.service
(...)
After=home.mount,shared.mount

enable it on boot time

systemctl enable shared.mount
systemctl enable rpcbind
systemctl enable nfs-lock
systemctl enable nfs-idmap

5.- Install DMTCP following this instructions : https://github.com/ciemat-tic/codec/wiki/Slurm-DMTCP

important flag: --with-prefix=/shared/dmtcp

6.- Install Slurm following this instructions https://github.com/ciemat-tic/codec/wiki/Slurm-cluster

Some things:

Downloaded with: git clone https://github.com/supermanue/slurm.git

Libraries to avoid warnings on configure:

yum install -y numactl-devel  libcurl-devel readline-devel man2html lua-devel rrdtool-devel freeipmi-devel pmix-devel hwloc-devel lz4-devel json-c-devel pam-devel 

bug in MariaDB (Oct.2017) obliges to do this hack:

[root@slurm_master lib64]# pwd
/usr/lib64

[root@slurm_master lib64]# ls *maria*
libmariadbclient.a

[root@slurm_master lib64]# pkg-config --libs mariadb
-lmariadb -lpthread -lz -ldl -lm -lssl -lcrypto  
###AS YOU CAN SEE, "lmariadb" IS MISSING. WE MANUALLY CREATE A LINK

[root@slurm_master lib64]# sudo ln -s /usr/lib64/libmariadbclient.a /usr/lib64/libmariadb.a

I had some issues on making Slurmdbd work because it wouldn't create the required DB users. I did it manually with

>mysql
MariaDB [(none)]> create user slurm identified by password "";
MariaDB [(none)]> use mysql;
MariaDB [(none)]> update user set password="" where User='slurm';
MariaDB [(none)]> flush privileges;

DMTCP branch: git checkout DMTCP

important flags on configure: --with-dmtcp=/shared/dmtcp --prefix=/shared/slurm

7.- Clone slave:

Things to change:

  • /etc/hostname
  • /etc/sysconfig/network-scripts/ifcfg-enp0s8
⚠️ **GitHub.com Fallback** ⚠️