How to Configure ZCU216 FPGA - IAA-BURSTT/document GitHub Wiki

All operations below are done at burstt* server (need login via Wireguard or ASIAA VPN)

  • zcu216/reload_burstt5.py : script for reloading bitcodes, including 16-bit, beamforming (bf matrix)
  • /data/wsh/script/initFPGA.sh : end-to-end

Table of Contents

Latest Configuration

  • Header version: 2
  • frequency range: 400-800 MHz by default, unless specified.
  • bitcode
    • 16-bit calibration (same as before):model_slx/frb2k_dx4_16bit_2/outputs/frb2k_dx4_16bit_2_2023-06-12_1157.fpg
    • 16-ch beamforming (bf16): model_slx/frb_bf16_4a/outputs/frb_bf16_4a_2023-11-23_1821.fpg
    • 64-ch beamforming (bf64): same as bf16 bitcode. The only difference between the bf16 and bf64 is just how the packets will be sent to the server (single destination for bf16 and multiple ones for bf64.
Branch Resolution Packets Notes Latest fpg file
Beamforming 4-bit 0-4 single dest_ip (64-ant outrigger) model_slx/frb_bf16_4a/outputs/frb_bf16_4a_2023-11-23_1821.fpg
. 4-bit 0-4 300-700 MHz, single dest_ip (64-ant outrigger) TBA
. 4-bit 0-4 multi dest_ip, for 64-ant model_slx/frb_bf16_4/outputs/frb_bf16_4_2023-09-06_1636.fpg
. 4-bit 4 for 128-ant model_slx/frb_bf16_x4/outputs/frb_bf16_x4_2023-09-25_1342.fpg
. 4-bit 8 for 256-ant model_slx/frb_bf16_x8/outputs/frb_bf16_x8_2023-09-25_2204.fpg
Mutliple destination IP 4-bit 0-4 packet order and synchronization fixed model_slx/frb_spec2k_dx4_6/outputs/frb_spec2k_dx4_6_2023-07-25_1853.fpg
. 16-bit 8 . TBA
Single destination IP 4-bit 0-4 . model_slx/frb_spec2k_dx4_6/outputs/frb_spec2k_dx4_6_2023-07-10_1655.fpg
. 16-bit 8 full resolution for calibration model_slx/frb2k_dx4_16bit_2/outputs/frb2k_dx4_16bit_2_2023-06-12_1157.fpg
. 16-bit 8 300-700 MHz (Nantou) model_slx/frb2k_dx4_16bit_2a/outputs/frb2k_dx4_16bit_2a_2024-03-12_300mhz.fpg
  • beamforming matrix (BFM)
    • save_eigenmode.py > eigen2bfm

Steps

Connect to Embedded Linux

The IP of embedded Linux of each FPGA (ZCU216)board is currently on the 192.168.40.* network, defined and hard-coded at /etc/dnsmasq.conf

  • get the latest dnsmasq.conf for new FPGAs from one of this server: cyyu, frblab3, burstt1
Check if dnsmasq daemon on server is running
 systemctl statusdnsmasq.service

If not, restart dnsmasq service

 sudo systemctl restart dnsmasq.service

Check if dnsmasq assigns IP to FPGAs

 cat /var/lib/dnsmasq/dnsmasq.leases

Whenever the FPGA board is reboot or power cycled, log in embedded Linux to enable the phase-locked loop (PLL) of the clock:

ssh [email protected].*
$ sudo ~/bin/prg_8a34001
casper@localhost:~$ sudo ./bin/prg_8a34001 
[sudo] password for casper: 
I am an alpaca i2c teapot
writing config to 8a34001...
should be programmed...

and then exit.

Ref 什麼是鎖相環 Phase-Locked Loop (PLL)? - NI

Configure FPGA bitcode

Load the conda environment

 $ conda activate rfsoc

or simply

 $ rfsoc

This will move the working directory to /home/ubuntu/rfsoc/python_zcu216

Load FPGA bitcode: setting registers, clock, network, etc. The main script is ~/rfsoc/pythono_zcu216/zcu216_100g_config.py

 (rfsoc)  python zcu216_100g_config.py -c [path to FPGA config file] [IP to embedded Linux at FPGA] --quickclock
  • IP to embedded Linux at FPGA: 192.168.40.*
  • “--quickclock” option: sets the FPGA clock and only need to be loaded once after FPGA booted up.
the FPGA configuration file is a text file (*.config), located under fpga_configs/ folder, including
  • Mapping of network interface card (NIC) MAC address and IP
    • fgain, frequency block selection, MTS

FPGA configuration file

The following is the config file used for #249 FPGA in Nantou, fpga_configs/nantou/fpga249_16bit_mts.config (2024-04-11):

[board]
macaddr = 00:00:12:30
localip = 10.17.16.8
dest_ip = 10.17.16.24

[common]
## 4-bit output for realtime processing
#fpgfile = ../latest/frb_spec2k_dx4_3_2022-10-28_1719.fpg
## 16-bit output for RFI calculation
fpgfile = ../model_slx/frb2k_dx4_16bit_2/outputs/frb2k_dx4_16bit_2_2023-06-12_1157.fpg
run_mts = True

arpfile = /home/ubuntu/rfsoc/python_zcu216/config_arp/config.yaml.nantou240327
netmask = 255.255.255.0
dest_port = 60000

## select packets by bitmask: +0:pack1 / +1:pack2 / +4:pack3 / +8:pack4
sel400  = 12
paylen  = 128
fgain   = 1

eq_real0 = 32
eq_imag0 = 0

setclock = False
quickclock = False

Meaning of each parameter:

The following three parameters must be identical as those in ARP config file (e.g., ./config_arp/config.yaml.*, see below)

  • localip: set the local IP address for FPGA
  • macaddr: set the local MAC address for FPGA
  • dest_ip: set the destination IP address to server
  • fpgfile: path to FPGA bitcode
  • arpfile: path to ARP config file (see below)
  • run_mts = True: always be True. quote from Xilinx: “multi-tile synchronization (MTS) feature enables multiple converter channels working with an aligned and deterministic latency across tiles and chips. “ for sampling with ADC
  • fgain: currently a constant for all channels and antennas. This will be channel- and antenna- dependent in the future.
    • simply set to 1 for bitcode of 16-bit data
  • sel400: 4-bit-encoded integer for 4 frequency blocks from [0,800] MHz. In this case, 12 = 4 +8, meaning only saving block #2 & #3, i.e., [400,800] MHz while discarding [0,400] MHz. This option is for 4-bit bit code only. 16-bit bitcode will ignore it.
    • e.g., use "sel400=6" for [200,]MHz data

ARP table

The ARP table is configured by a text file customized for each server.

To configure ARP table, e.g., config_arp/config.yaml.nantou240327 use “ifconfig” command to find the IP and MAC addresses of the SFP network interface. The burstt server should have 4 SFP interface, each of 25 Gbps speed and thus 100Gbos in total, for burstt5,

$ ifconfig
ens2f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9216
        inet 10.17.16.24  netmask 255.255.255.0  broadcast 10.17.16.255
        ether 88:e9:a4:97:aa:8e  txqueuelen 1000  (Ethernet)

ens2f1np1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9216
        inet 10.17.16.28  netmask 255.255.255.0  broadcast 10.17.16.255
        ether 88:e9:a4:97:aa:8f  txqueuelen 1000  (Ethernet)

ens4f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9216
        inet 10.17.16.25  netmask 255.255.255.0  broadcast 10.17.16.255
        ether 88:e9:a4:97:aa:c6  txqueuelen 1000  (Ethernet)
       
ens4f1np1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9216
        inet 10.17.16.29  netmask 255.255.255.0  broadcast 10.17.16.255
        ether 88:e9:a4:97:aa:c7  txqueuelen 1000  (Ethernet)

(2024 ver) Conventionally, IP at 10.17.16.{8-23} are reserved for FPGA, whereas the destination IP are set as following (for main station with IP switch):

  • ens2f0np0 10.17.16.24
  • ens2f1np1 10.17.16.28
  • ens4f0np0 10.17.16.25
  • ens4f1np1 10.17.16.29
Hence, in the config file
......
arp:
  xb-engine:
    # burstt5 server
    # ens2f0np0 88:e9:a4:97:aa:8e
    # ens2f1np1 88:e9:a4:97:aa:8f
    # ens4f0np0 88:e9:a4:97:aa:c6
    # ens4f1np1 88:e9:a4:97:aa:c7
    10.17.16.24: 0x88e9a497aa8e
    10.17.16.28: 0x88e9a497aa8f
    10.17.16.25: 0x88e9a497aac6
    10.17.16.29: 0x88e9a497aac7

  f-engine:
    10.17.16.8:  0x000000001230
    10.17.17.9:  0x000000001231
    10.17.16.10: 0x000000001232
    10.17.16.11: 0x000000001233

The MAC address for SFP interface of FPGA board (123*) is arbitrary here and only relevant when there is an IP switch between server and FPGAs (more than 4 boards).

If configured and bitcode loaded successfully, the output should contain the following:

FPGA clock = 400.5005495 MHz
Run MTS ..

One can check if RUDP packets are transferred from FPGA to server with Wireshark. Each packet should have 64B header + 8192B payload = 8256 Byte long, with source and destination IPs as set in config file.

Using 16-antenna beamforming bitcode

Beamforming bitcode requires loading the beamforming matrix as well. See https://wiki.tir.tw/index.php/Data_Acquisition for preparation of the matrix.

  • copy *.npy to ~/rfsoc/python_zcu216/BFM folder. Remember to back up the existing ones inside. For example:
(kylin) [ubuntu@burstt5 sun64-20240508]$ ls 64ant_240511_18h.scale.out/*.npy
64ant_240511_18h.scale.out/fpga0.idt.npy  64ant_240511_18h.scale.out/fpga2.idt.npy
64ant_240511_18h.scale.out/fpga0.pos.npy  64ant_240511_18h.scale.out/fpga2.pos.npy
64ant_240511_18h.scale.out/fpga0.vis.npy  64ant_240511_18h.scale.out/fpga2.vis.npy
64ant_240511_18h.scale.out/fpga1.idt.npy  64ant_240511_18h.scale.out/fpga3.idt.npy
64ant_240511_18h.scale.out/fpga1.pos.npy  64ant_240511_18h.scale.out/fpga3.pos.npy
64ant_240511_18h.scale.out/fpga1.vis.npy  64ant_240511_18h.scale.out/fpga3.vis.npy
  • load the bitcode with the script in ~/rfsoc/python_zcu216/
 python reload_burstt5.py bf16 -t pos --quickclock

for 64-antenna outrigger station, use 'bf16' option.

  • use fpga*.vis.npy
    • the 16-channel data will contain auto-correlation of 4 selected antennas and the visibility (cross-correlation) of these 12 pairs, rather than the 16 beamformed data.
    • used for taking continuous intensity data
    • arguments at eigen2bfm for antenna selection

Reset PPS counter

In case of multiple FPGAs, e.g., 4 FPGAs for 64 antennas, the boards are inevitably configured at different time. Therefore, we need to manually reset the PPS counter of each board to synchronize them.

interactive Python mode

 (rfsoc) [ubuntu@burstt5 config_arp]$ python -i zcu_newcontrol.py 





  • to define other boards, type their IP directly, e.g.:
 >>> boards=['249', '246','245','232']
 *(Re-)Connect the boards:
    connect for the first time, or re-connect after reloading bitcodes 
fpgas = connect(boards)
>>> fpgas = connect(boards)
connecting to: ['249', '246', '245', '232']
FPGA249 current IP: 10.17.16.5
FPGA246 current IP: 10.17.16.9
FPGA245 current IP: 10.17.16.13
FPGA232 current IP: 10.17.16.17
  • read the PPS counter, "pps_count":
 >>> fpgaIO('r', fpgas, 'pps_count')
 [154685, 154744, 154723, 154763]
  • read the epoch time, "epoch_second", which is the timestamp when each FPGA was configured (should be at different time)
 >>> fpgaIO('r', fpgas, 'epoch_second')
 [1711527603, 1711527582, 1711527558, 1711527533]
  • reset the counter (default: cmd=4):
 >>> resetepoch(fpgas)
 resetting fpga: 10.17.16.5
 resetting fpga: 10.17.16.9
 resetting fpga: 10.17.16.13
 resetting fpga: 10.17.16.17
 (1711682541, 0.3408629894256592)
  • check again, both PPS count and epoch time should be identical for all FPGAs:
 >>> fpgaIO('r', fpgas, 'epoch_second')
 [1711682541, 1711682541, 1711682541, 1711682541]
 >>> fpgaIO('r', fpgas, 'pps_count')
 [6, 6, 6, 6]
 write the FGAIN: (optional)
        fpgaIO('w', fpgas, 'fgain', FGAIN)

Read Snapshots and Spectra Directly from FPGA

get snapshots (waveforms), average (accumulated) spectra, and spectra after truncating 16-bit data to 4-bit, retrieved directly from FPGA via embedded linux (without going through SFP interface to server).

The amplitude of snaptshot should be within +/-32768 counts. , as the spectra is 16-bit (though the ADC has a 14-bit depth). The power spectra is directly converted from 20 log (ADC count) and thus the max is ~90 dB.

Check 16 inputs

(rfsoc)$ python zcu216_snapadc_16inp.py -s [IP to embedded Linux at FPGA]
(rfsoc)$ python zcu216_accum_spec_16inp.py -s  [IP to embedded Linux at FPGA]
(rfsoc)$ python zcu216_quant_spec_16inp.py -s  [IP to embedded Linux at FPGA]

Check single input (Old version)

# 16-bit data
(rfsoc)$ python zcu216_snapadc.py -s -c [input channel#] [IP to embedded Linux at FPGA]
(rfsoc)$ python zcu216_accum_spec.py -s -c [input channel#] [IP to embedded Linux at FPGA]
  • use ‘-s’ to skip programming the FPGA (already programmed)
  • use ‘-c ’ to select spectrum from an input. as of this writing, only one input is available =8. The channel mapping between DAQ (labels on FPGA enclosure) and FPGA (internal) is swapped, e.g. 0→15 1→14, …
use ‘-f ’ to set the fgain. =1000 is appropriate for very weak input.

Misc.

Auto SSH login with public key authentication

e.g.

 ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

Note that there is no 'home/casper/.ssh' directory nor authorized_keys for manually copying the public key.

Troubleshooting

Connection to FPGA

Sometimes it may fail to start and we can not connect to the FPGA. Troubleshooting: For example, the error message may be like:

frblab3 dnsmasq[1556]: unknown interface enp7s0f0
frblab3 dnsmasq[1556]: FAILED to start up
Which is possibly due to incorrect order of daemon activation. In this case, running ‘ifconfig’ shows the interface ‘enp7s0f0’ has no IP assigned. To activate the network interface, one can use “nmtui”, a text interface for network management
 $ sudo nmtui 

select ‘activate a connection’ > select ‘enp7s0f0’ > press ‘Enter’ . If activated, there is a ‘*’ sign in front of the interface name. Press ‘Esc’ to exit nmtui. If successful, the interface should have IP assigned now. Also check if “/etc/dnsmasq.conf” has “interface” assigned to the network interface connecting to the FPGA

 interface=enp7s0f0 # this is the NIC connected to the frblab3

Then restart the dnsmasq

 $ sudo systemctl restart dnsmasq.service

Check if dnsmasq assign IP to the FPGA

 $ cat /var/lib/dnsmasq/dnsmasq.leases

For example, at frblab3,

 1684199012 0a:4c:50:41:43:45 192.168.40.220 * ff:50:41:43:45:00:01:00:01:28:1b:5e:4a:0a:4c:50:41:43:41

conda environment not found

to check all available environments

 $ conda env list

Incorrect packet length

(2024/04/11) If you find in Wireshark

  • identify UDP packet as QUIC one
  • some packets with bad length (only 5xxx B instead of 8296B)
The problem disappear after power cycling the FPGA

Or follow the same steps as reset counters using config_arp/zcu_new_control.py

 resetepoch(fpgas, cmd=6)

cmd=6 will reset both clock and PPS counters and reset 100GbE interface.

RuntimeError: Either filename or parsed fpg data must be given

This happens when accessing FPGA registers before its bitcode is loaded.

Possible Cause: unexpected power outage / cycle . One can check the uptime of the FPGA, for example

 $ ssh casper@[FPGA IP]
 casper@localhost:~$ uptime
 18:33:02 up  2:55,  1 user,  load average: 0.08, 0.02, 0.01

Solution: the FPGA bitcode needs to be reloaded.

(Critical) FPGAs are not synchronized

disp_header.py show the data header of xxx block, working when bursttd is running.

  • packet number:
  • clock counter: clock frequency is not locked to 400 MHz. Rebooting FPGA hardly helps.
    • plausible solution: check cable connection to 10 MHz distribution, or replace 10 MHz distributor by one without built-in oscillator.
  • PPS counter:
  • packet order: normally it should be {4,5,6,7} for 16-bit data or {2,3,2,3} for 4-bit data. seems only reboot the FPGA can solve by chance
 ssh [email protected].[]
 sudo reboot

Long power cycle (power off for >10 min ) seems working.

Possible causes:

  • 10 MHz distribution amplifier with built-in oscillator? No.
    • replaced by 2* 2-way splitter. Problem persists
  • bad coaxial cable connection of 10MHz Ref? No.
    • changed coaxial cable. Problem persists
  • program?
    • FPGA #3 (both 232 & 220) has problem
      • single FPGA 220 worked normally at lab
    • synchronized at 1st reload of bitcode after powered on
    • out of sync after 2nd reload of FPGA bitcode
  • specs of LMK04828B highest performance clock conditioner : 0.25-2.4 Vpp 10MHz input. There is a 3 dB attenuator on the path, meaning 0.35-3.39 Vpp is acceptable.

invalid header encountered

cannot read packet counters from /dev/hugepages/fpga*

$ disp_header.py -meta 128 /dev/hugepages/fpga*

file: /dev/hugepages/fpga0
invalid header encountered
None
invalid header encountered
None
invalid header encountered
None
invalid header encountered
None
file: /dev/hugepages/fpga1
(1036800000, 530841600398, 1712821196, 1327, 4)
(1036800001, 530841600910, 1712821196, 1327, 5)
(1036800002, 530841601422, 1712821196, 1327, 6)
(1036800003, 530841601934, 1712821196, 1327, 7)

This is likely because bursttd is not running.

⚠️ **GitHub.com Fallback** ⚠️