PXE over Infiniband - davidcarver/losf-cookbook GitHub Wiki

Netbooting over Infiniband is accomplished through Mellanox's FlexBoot. There may be other ways to pass a PXE rom to an infiniband card, but this is the path of least resistance for the hardware being used to write this document at this time.

For an InfiniBand port, Mellanox FlexBoot implements a network driver with IP over IB acting as the transport layer. IP over IB is part of the Mellanox OFED for Linux software package. The binary code is exported by the device as an expansion ROM image. For detailed information on Mellanox FlexBoot see Mellanox FlexBoot User Manual

Mellanox versions required for flexboot in addition to the FlexBoot ROM expanded firmware.

ConnectX®-3 Firmware fw-ConnectX3 version 2.10.0000 and above ConnectX®-2 Firmware fw-ConnectX2 version 2.8.0600 and above

First, get your infiniband drivers installed on the host running DHCPD, TFTPD, etc.

yum -y install libmlx4 libmthca ibutils infiniband-diags

Make the driver for IP over IB load every time by making this file.

[[email protected]]# pwd
/etc/sysconfig/modules
[[email protected]]# vim ib.modules
[[email protected]]# chmod +x ib.modules 
[[email protected]]# cat ib.modules 
#!/bin/bash

# Common modules
modprobe rdma_ucm 2>&1
modprobe ib_umad 2>&1
modprobe ib_uverbs 2>&1
# IP over IB
modprobe ib_ipoib 2>&1

At this point, after a reboot or manually adding the kernel module, your IB interfaces should show up.

[[email protected]]# ip link show | grep ib 
6: ib0: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN qlen 256
    link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:de:a1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
7: ib1: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN qlen 256
    link/infiniband 80:00:00:49:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:de:a2 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

Let's find which ib interface has something plugged into it, look for "LinkUp". Notice that the "ip" command didn't know the state was up, but this utility does.

[[email protected]]# ibstat | tail -n 20
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 56
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0xf4521403007c2d51
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0xf4521403007c2d52
                Link layer: InfiniBand

Lets get the openib subnet manager running on the server we will be PXE booting from. If you already have a subnet manager running somewhere, you don't need to do this.

[[email protected]]# yum install opensm
[[email protected]]# service opensm start
[[email protected]]# chkconfig opensm on

If everything is working, you will see State: Active where previously it was State: Initializing

[email protected]]# ibstat | tail -n 20                                                                                                                                                                         
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251486a
                Port GUID: 0xf4521403007bdea1
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0xf4521403007bdea2
                Link layer: InfiniBand

Configure the interface network script with your desired ipv4 settings

[[email protected]]# pwd
/etc/sysconfig/network-scripts
[[email protected]]# cat ifcfg-ib0 
DEVICE="ib0"
BOOTPROTO="none"
DHCP_HOSTNAME="node16.cluster.tacc.utexas.edu"
HWADDR="80:00:00:48:FE:80:00:00:00:00:00:00:F4:52:14:03:00:7B:DE:A1"
NM_CONTROLLED="no"
ONBOOT="yes"
#TYPE="InfiniBand"
TYPE="Ethernet"
UUID="ddb1a567-2ccb-49ee-abeb-69318420d7fc"
MTU="1500"
IPADDR="192.168.99.99"
NETMASK="255.255.255.0"

Bring up the interface

[[email protected]]# ifup ib0

Verify we have IP on the IB interface

[[email protected]]# ip addr show ib0
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 256
    link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:7b:de:a1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.99.99/24 brd 192.168.99.255 scope global ib0
    inet6 fe80::f652:1403:7b:dea1/64 scope link 
       valid_lft forever preferred_lft forever

At this point we should have an Infiniband interface with an IP address. Let's use cobbler to get the PXE boot services up and running. Setting up cobbler is not covered in this document but do the same configuration you would use for a normal Ethernet interface.

What is the IP MAC address of the IB card? Good question, there are two ways to find it. You can simply start the FlexBoot rom and view the MAC address it uses on the console, or you can run mstflint command on the host. This command will let you set custom mac addresses too, which might be handy.

[[email protected]]# yum install mstflint

We also need to verify the firmware on the controller like this.

[[email protected]]# lspci | grep Mella
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
[[email protected]]# mstflint -d 08:00.0 q
Image type:      ConnectX
FW Version:      2.31.5050
Product Version: 02.31.50.50
Rom Info:        type=PXE  version=3.4.225 devid=4099 proto=VPI
Device ID:       4099
Description:     Node             Port1            Port2            Sys image
GUIDs:           f4521403007bdea0 f4521403007bdea1 f4521403007bdea2 f4521403007bdea3 
MACs:                                 f452147bdea1     f452147bdea2
VSD:             
PSID:            MT_1090120019

wget http://www.mellanox.com/downloads/Drivers/PXE/FlexBoot-3.4.306_EN.tar tar xf FlexBoot-3.4.306_EN.tar

The "Device ID" is important, that shows us what FlexBoot rom to install.

FlexBoot-3.4.306_EN_4099.mrom

Burn the rom

[[email protected]]# mstflint -d 05:00.0 brom FlexBoot-3.4.306_IB_4099.mrom 

    Current ROM info on flash: N/A
    New ROM info:              type=PXE  version=3.4.306 devid=4099 proto=IB

Burning ROM image    - OK  
Restoring signature  - OK  

Now we have a IPoIB PXE enabled adapter, showing up as "FlexBoot" network bootable card in the BIOS. More good news, the default initrd and vmlinuz of CentIS6.5 includes the Infiniband drivers! Behold, during a fresh installation the InfiniBand device shows up and is able to get an IP address over DHCP:

Oct  9 12:54:41 node16.cluster.tacc.utexas.edu dhcpd: DHCPDISCOVER from  via ib0
Oct  9 12:54:42 node16.cluster.tacc.utexas.edu dhcpd: DHCPOFFER on 192.168.99.100 to  via ib0
Oct  9 12:54:42 node16.cluster.tacc.utexas.edu dhcpd: DHCPREQUEST for 192.168.99.100 (192.168.99.99) from  via ib0
Oct  9 12:54:42 node16.cluster.tacc.utexas.edu dhcpd: DHCPACK on 192.168.99.100 to  via ib0
[[email protected]]# yum --setopt=group_package_types=mandatory,default,optional groupinstall --downloadonly --downloaddir=/tmp/ibrpms "Infiniband"

[[email protected]]# for RPM in $(ls /tmp/ibrpms); do losf --yes addrpm /tmp/ibrpms/$RPM --alias=ib; done

losf --yes addalias ib
update