Spock Installation: InfiniBand - calab-ntu/gpu-cluster GitHub Wiki

Switch

Initialization

  1. Plug both power cables and wait for all system status led bright solid green.

  2. Connect a host PC (e.g., spock00) to the console (RJ-45) port of the switch using the supplied RJ-451-to-DB9 cable + DB9-to-USB cable

  3. Login with the ubuntu PC

    1. Get the USB device name : ls /dev/ttyUSB*

      If there is only one USB device plug on the PC, it would show ttyUSB0

    2. Connect to switch with su privilige screen /dev/ttyUSB0 115200 and press enter twice.
    3. Login:
      Username: admin
      Password: admin
      
  4. Configuration (Below question will be ask at the first connection)

    Do you want to use the wizard for initial configuration? yes
    Step 1: Hostname? [switch-d79b5a]
    Step 2: Use DHCP on mgmt0 interface? [yes] no
    Step 3: Use zeroconf on mgmt0 interface [no]
    Step 4: Primary IPv4 address and masklen? [0.0.0.0/0] 192.168.0.100/24
    Step 5: Default gateway? 192.168.0.1
    Step 6: Primary DNS server? 140.112.254.4
    Step 7: Domain name?
    Step 8: Enable IPv6? [yes]
    Step 9: Enable IPv6 autoconfig (SLAAC) on mgmt0 interface? [no]
    Step 10: Enable DHCPv6 on mgmt0 interface? [yes] no
    Step 11: Admin password (Must be typed)? #set it the same as spock
    Step 11: Confirm admin password?
    Step 12: Monitor password (Must be typed)? #same as admin password
    Step 12: Confirm monitor password?
    

    If there is needed to resetup the configure enable config terminal configuration jump-start

  5. Check

    1. System version show version
      Product name:      MLNX-OS
      Product release:   3.8.2102
      Build ID:          #1-dev
      Build date:        2019-11-26 21:48:40
      Target arch:       x86_64
      Target hw:         x86_64
      Built by:          jenkins@c776fa44be2b
      Version summary:   X86_64 3.8.2102 2019-11-26 21:48:40 x86_64
      Product model:     x86onie
      Host ID:           043F72D79B5A
      System serial num: MT2039J30791
      System UUID:       f73a8370-1456-11eb-8000-043f72d00e66
      Uptime:            18h 12m 33.108s
      CPU load averages: 3.11 / 3.05 / 3.01
      Number of CPUs:    4
      System memory:     468 MB used / 7333 MB free / 7801 MB total
      Swap:              0 MB used / 0 MB free / 0 MB total       
      
    2. mgmt0 interface
      enable
      show interfaces mgmt0
      
      Interface mgmt0 status:
      Comment         :
      Admin up        : yes
      Link up         : yes
      DHCP running    : no
      IP address      : 192.168.0.100
      Netmask         : 255.255.255.0
      IPv6 enabled    : yes
      Autoconf enabled: no
      Autoconf route  : yes
      Autoconf privacy: no
      DHCPv6 running  : no
      IPv6 addresses  : 1
      
  6. Enable OpenSM

    1. enable
    2. configure terminal
    3. ib smnode switch-d79b5a enable
    4. show ib sm
      enable
      
    5. no configure
  7. Logout and exit with [CTRL + A] and [CTRL + K]

Rerun initialization

  1. Login to switch (w/ console port or ssh)
  2. enable
  3. configure terminal
  4. configuration jump-start

Enable OpenSM

  1. enable
  2. configure terminal
  3. ib smnode switch-d79b5a enable
  4. show ib sm
    enable
    
  5. on configure

SSH

Unable to negotiate with 192.168.0.100 port 22: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1 Unable to negotiate with 192.168.0.100 port 22: no matching host key type found. Their offer: ssh-rsa Above error message shown while we try to ssh to the switch with ubuntu 22.04

  1. Add lines at the end of the file etc/ssh/ssh_config
    KexAlgorithms=+diffie-hellman-group14-sha1
    HostKeyAlgorithms=+ssh-rsa
    
  2. Restart ssh service service ssh restart