How to Configure ZCU216 FPGA - IAA-BURSTT/document GitHub Wiki
All operations below are done at burstt* server (need login via Wireguard or ASIAA VPN)
- zcu216/reload_burstt5.py : script for reloading bitcodes, including 16-bit, beamforming (bf matrix)
- /data/wsh/script/initFPGA.sh : end-to-end
- Header version: 2
- frequency range: 400-800 MHz by default, unless specified.
- bitcode
- 16-bit calibration (same as before):model_slx/frb2k_dx4_16bit_2/outputs/frb2k_dx4_16bit_2_2023-06-12_1157.fpg
- 16-ch beamforming (bf16): model_slx/frb_bf16_4a/outputs/frb_bf16_4a_2023-11-23_1821.fpg
- 64-ch beamforming (bf64): same as bf16 bitcode. The only difference between the bf16 and bf64 is just how the packets will be sent to the server (single destination for bf16 and multiple ones for bf64.
| Branch | Resolution | Packets | Notes | Latest fpg file |
|---|---|---|---|---|
| Beamforming | 4-bit | 0-4 | single dest_ip (64-ant outrigger) | model_slx/frb_bf16_4a/outputs/frb_bf16_4a_2023-11-23_1821.fpg |
| . | 4-bit | 0-4 | 300-700 MHz, single dest_ip (64-ant outrigger) | TBA |
| . | 4-bit | 0-4 | multi dest_ip, for 64-ant | model_slx/frb_bf16_4/outputs/frb_bf16_4_2023-09-06_1636.fpg |
| . | 4-bit | 4 | for 128-ant | model_slx/frb_bf16_x4/outputs/frb_bf16_x4_2023-09-25_1342.fpg |
| . | 4-bit | 8 | for 256-ant | model_slx/frb_bf16_x8/outputs/frb_bf16_x8_2023-09-25_2204.fpg |
| Mutliple destination IP | 4-bit | 0-4 | packet order and synchronization fixed | model_slx/frb_spec2k_dx4_6/outputs/frb_spec2k_dx4_6_2023-07-25_1853.fpg |
| . | 16-bit | 8 | . | TBA |
| Single destination IP | 4-bit | 0-4 | . | model_slx/frb_spec2k_dx4_6/outputs/frb_spec2k_dx4_6_2023-07-10_1655.fpg |
| . | 16-bit | 8 | full resolution for calibration | model_slx/frb2k_dx4_16bit_2/outputs/frb2k_dx4_16bit_2_2023-06-12_1157.fpg |
| . | 16-bit | 8 | 300-700 MHz (Nantou) | model_slx/frb2k_dx4_16bit_2a/outputs/frb2k_dx4_16bit_2a_2024-03-12_300mhz.fpg |
- beamforming matrix (BFM)
- save_eigenmode.py > eigen2bfm
The IP of embedded Linux of each FPGA (ZCU216)board is currently on the 192.168.40.* network, defined and hard-coded at /etc/dnsmasq.conf
- get the latest dnsmasq.conf for new FPGAs from one of this server: cyyu, frblab3, burstt1
systemctl statusdnsmasq.service
If not, restart dnsmasq service
sudo systemctl restart dnsmasq.service
Check if dnsmasq assigns IP to FPGAs
cat /var/lib/dnsmasq/dnsmasq.leases
Whenever the FPGA board is reboot or power cycled, log in embedded Linux to enable the phase-locked loop (PLL) of the clock:
ssh [email protected].* $ sudo ~/bin/prg_8a34001 casper@localhost:~$ sudo ./bin/prg_8a34001 [sudo] password for casper: I am an alpaca i2c teapot writing config to 8a34001... should be programmed...
and then exit.
Ref 什麼是鎖相環 Phase-Locked Loop (PLL)? - NI
Load the conda environment
$ conda activate rfsoc
or simply
$ rfsoc
This will move the working directory to /home/ubuntu/rfsoc/python_zcu216
Load FPGA bitcode: setting registers, clock, network, etc. The main script is ~/rfsoc/pythono_zcu216/zcu216_100g_config.py
(rfsoc) python zcu216_100g_config.py -c [path to FPGA config file] [IP to embedded Linux at FPGA] --quickclock
- IP to embedded Linux at FPGA: 192.168.40.*
- “--quickclock” option: sets the FPGA clock and only need to be loaded once after FPGA booted up.
- Mapping of network interface card (NIC) MAC address and IP
- fgain, frequency block selection, MTS
The following is the config file used for #249 FPGA in Nantou, fpga_configs/nantou/fpga249_16bit_mts.config (2024-04-11):
[board] macaddr = 00:00:12:30 localip = 10.17.16.8 dest_ip = 10.17.16.24 [common] ## 4-bit output for realtime processing #fpgfile = ../latest/frb_spec2k_dx4_3_2022-10-28_1719.fpg ## 16-bit output for RFI calculation fpgfile = ../model_slx/frb2k_dx4_16bit_2/outputs/frb2k_dx4_16bit_2_2023-06-12_1157.fpg run_mts = True arpfile = /home/ubuntu/rfsoc/python_zcu216/config_arp/config.yaml.nantou240327 netmask = 255.255.255.0 dest_port = 60000 ## select packets by bitmask: +0:pack1 / +1:pack2 / +4:pack3 / +8:pack4 sel400 = 12 paylen = 128 fgain = 1 eq_real0 = 32 eq_imag0 = 0 setclock = False quickclock = False
Meaning of each parameter:
The following three parameters must be identical as those in ARP config file (e.g., ./config_arp/config.yaml.*, see below)
- localip: set the local IP address for FPGA
- macaddr: set the local MAC address for FPGA
- dest_ip: set the destination IP address to server
- fpgfile: path to FPGA bitcode
- arpfile: path to ARP config file (see below)
- run_mts = True: always be True. quote from Xilinx: “multi-tile synchronization (MTS) feature enables multiple converter channels working with an aligned and deterministic latency across tiles and chips. “ for sampling with ADC
- fgain: currently a constant for all channels and antennas. This will be channel- and antenna- dependent in the future.
- simply set to 1 for bitcode of 16-bit data
- sel400: 4-bit-encoded integer for 4 frequency blocks from [0,800] MHz. In this case, 12 = 4 +8, meaning only saving block #2 & #3, i.e., [400,800] MHz while discarding [0,400] MHz. This option is for 4-bit bit code only. 16-bit bitcode will ignore it.
- e.g., use "sel400=6" for [200,]MHz data
The ARP table is configured by a text file customized for each server.
To configure ARP table, e.g., config_arp/config.yaml.nantou240327 use “ifconfig” command to find the IP and MAC addresses of the SFP network interface. The burstt server should have 4 SFP interface, each of 25 Gbps speed and thus 100Gbos in total, for burstt5,
$ ifconfig
ens2f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9216
inet 10.17.16.24 netmask 255.255.255.0 broadcast 10.17.16.255
ether 88:e9:a4:97:aa:8e txqueuelen 1000 (Ethernet)
ens2f1np1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9216
inet 10.17.16.28 netmask 255.255.255.0 broadcast 10.17.16.255
ether 88:e9:a4:97:aa:8f txqueuelen 1000 (Ethernet)
ens4f0np0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9216
inet 10.17.16.25 netmask 255.255.255.0 broadcast 10.17.16.255
ether 88:e9:a4:97:aa:c6 txqueuelen 1000 (Ethernet)
ens4f1np1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9216
inet 10.17.16.29 netmask 255.255.255.0 broadcast 10.17.16.255
ether 88:e9:a4:97:aa:c7 txqueuelen 1000 (Ethernet)
(2024 ver) Conventionally, IP at 10.17.16.{8-23} are reserved for FPGA, whereas the destination IP are set as following (for main station with IP switch):
- ens2f0np0 10.17.16.24
- ens2f1np1 10.17.16.28
- ens4f0np0 10.17.16.25
- ens4f1np1 10.17.16.29
......
arp:
xb-engine:
# burstt5 server
# ens2f0np0 88:e9:a4:97:aa:8e
# ens2f1np1 88:e9:a4:97:aa:8f
# ens4f0np0 88:e9:a4:97:aa:c6
# ens4f1np1 88:e9:a4:97:aa:c7
10.17.16.24: 0x88e9a497aa8e
10.17.16.28: 0x88e9a497aa8f
10.17.16.25: 0x88e9a497aac6
10.17.16.29: 0x88e9a497aac7
f-engine:
10.17.16.8: 0x000000001230
10.17.17.9: 0x000000001231
10.17.16.10: 0x000000001232
10.17.16.11: 0x000000001233
The MAC address for SFP interface of FPGA board (123*) is arbitrary here and only relevant when there is an IP switch between server and FPGAs (more than 4 boards).
If configured and bitcode loaded successfully, the output should contain the following:
FPGA clock = 400.5005495 MHz Run MTS ..
One can check if RUDP packets are transferred from FPGA to server with Wireshark. Each packet should have 64B header + 8192B payload = 8256 Byte long, with source and destination IPs as set in config file.
Beamforming bitcode requires loading the beamforming matrix as well. See https://wiki.tir.tw/index.php/Data_Acquisition for preparation of the matrix.
- copy *.npy to ~/rfsoc/python_zcu216/BFM folder. Remember to back up the existing ones inside. For example:
(kylin) [ubuntu@burstt5 sun64-20240508]$ ls 64ant_240511_18h.scale.out/*.npy 64ant_240511_18h.scale.out/fpga0.idt.npy 64ant_240511_18h.scale.out/fpga2.idt.npy 64ant_240511_18h.scale.out/fpga0.pos.npy 64ant_240511_18h.scale.out/fpga2.pos.npy 64ant_240511_18h.scale.out/fpga0.vis.npy 64ant_240511_18h.scale.out/fpga2.vis.npy 64ant_240511_18h.scale.out/fpga1.idt.npy 64ant_240511_18h.scale.out/fpga3.idt.npy 64ant_240511_18h.scale.out/fpga1.pos.npy 64ant_240511_18h.scale.out/fpga3.pos.npy 64ant_240511_18h.scale.out/fpga1.vis.npy 64ant_240511_18h.scale.out/fpga3.vis.npy
- load the bitcode with the script in ~/rfsoc/python_zcu216/
python reload_burstt5.py bf16 -t pos --quickclock
for 64-antenna outrigger station, use 'bf16' option.
- use fpga*.vis.npy
- the 16-channel data will contain auto-correlation of 4 selected antennas and the visibility (cross-correlation) of these 12 pairs, rather than the 16 beamformed data.
- used for taking continuous intensity data
- arguments at eigen2bfm for antenna selection
In case of multiple FPGAs, e.g., 4 FPGAs for 64 antennas, the boards are inevitably configured at different time. Therefore, we need to manually reset the PPS counter of each board to synchronize them.
interactive Python mode
(rfsoc) [ubuntu@burstt5 config_arp]$ python -i zcu_newcontrol.py
- to define other boards, type their IP directly, e.g.:
>>> boards=['249', '246','245','232']
*(Re-)Connect the boards:
connect for the first time, or re-connect after reloading bitcodes
fpgas = connect(boards) >>> fpgas = connect(boards) connecting to: ['249', '246', '245', '232'] FPGA249 current IP: 10.17.16.5 FPGA246 current IP: 10.17.16.9 FPGA245 current IP: 10.17.16.13 FPGA232 current IP: 10.17.16.17
- read the PPS counter, "pps_count":
>>> fpgaIO('r', fpgas, 'pps_count')
[154685, 154744, 154723, 154763]
- read the epoch time, "epoch_second", which is the timestamp when each FPGA was configured (should be at different time)
>>> fpgaIO('r', fpgas, 'epoch_second')
[1711527603, 1711527582, 1711527558, 1711527533]
- reset the counter (default: cmd=4):
>>> resetepoch(fpgas) resetting fpga: 10.17.16.5 resetting fpga: 10.17.16.9 resetting fpga: 10.17.16.13 resetting fpga: 10.17.16.17 (1711682541, 0.3408629894256592)
- check again, both PPS count and epoch time should be identical for all FPGAs:
>>> fpgaIO('r', fpgas, 'epoch_second')
[1711682541, 1711682541, 1711682541, 1711682541]
>>> fpgaIO('r', fpgas, 'pps_count')
[6, 6, 6, 6]
write the FGAIN: (optional)
fpgaIO('w', fpgas, 'fgain', FGAIN)
get snapshots (waveforms), average (accumulated) spectra, and spectra after truncating 16-bit data to 4-bit, retrieved directly from FPGA via embedded linux (without going through SFP interface to server).
The amplitude of snaptshot should be within +/-32768 counts. , as the spectra is 16-bit (though the ADC has a 14-bit depth). The power spectra is directly converted from 20 log (ADC count) and thus the max is ~90 dB.
(rfsoc)$ python zcu216_snapadc_16inp.py -s [IP to embedded Linux at FPGA] (rfsoc)$ python zcu216_accum_spec_16inp.py -s [IP to embedded Linux at FPGA] (rfsoc)$ python zcu216_quant_spec_16inp.py -s [IP to embedded Linux at FPGA]
# 16-bit data (rfsoc)$ python zcu216_snapadc.py -s -c [input channel#] [IP to embedded Linux at FPGA] (rfsoc)$ python zcu216_accum_spec.py -s -c [input channel#] [IP to embedded Linux at FPGA]
- use ‘-s’ to skip programming the FPGA (already programmed)
- use ‘-c ’ to select spectrum from an input. as of this writing, only one input is available =8. The channel mapping between DAQ (labels on FPGA enclosure) and FPGA (internal) is swapped, e.g. 0→15 1→14, …
e.g.
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
Note that there is no 'home/casper/.ssh' directory nor authorized_keys for manually copying the public key.
Sometimes it may fail to start and we can not connect to the FPGA. Troubleshooting: For example, the error message may be like:
frblab3 dnsmasq[1556]: unknown interface enp7s0f0 frblab3 dnsmasq[1556]: FAILED to start upWhich is possibly due to incorrect order of daemon activation. In this case, running ‘ifconfig’ shows the interface ‘enp7s0f0’ has no IP assigned. To activate the network interface, one can use “nmtui”, a text interface for network management
$ sudo nmtui
select ‘activate a connection’ > select ‘enp7s0f0’ > press ‘Enter’ . If activated, there is a ‘*’ sign in front of the interface name. Press ‘Esc’ to exit nmtui. If successful, the interface should have IP assigned now. Also check if “/etc/dnsmasq.conf” has “interface” assigned to the network interface connecting to the FPGA
interface=enp7s0f0 # this is the NIC connected to the frblab3
Then restart the dnsmasq
$ sudo systemctl restart dnsmasq.service
Check if dnsmasq assign IP to the FPGA
$ cat /var/lib/dnsmasq/dnsmasq.leases
For example, at frblab3,
1684199012 0a:4c:50:41:43:45 192.168.40.220 * ff:50:41:43:45:00:01:00:01:28:1b:5e:4a:0a:4c:50:41:43:41
to check all available environments
$ conda env list
(2024/04/11) If you find in Wireshark
- identify UDP packet as QUIC one
- some packets with bad length (only 5xxx B instead of 8296B)
Or follow the same steps as reset counters using config_arp/zcu_new_control.py
resetepoch(fpgas, cmd=6)
cmd=6 will reset both clock and PPS counters and reset 100GbE interface.
This happens when accessing FPGA registers before its bitcode is loaded.
Possible Cause: unexpected power outage / cycle . One can check the uptime of the FPGA, for example
$ ssh casper@[FPGA IP] casper@localhost:~$ uptime 18:33:02 up 2:55, 1 user, load average: 0.08, 0.02, 0.01
Solution: the FPGA bitcode needs to be reloaded.
disp_header.py show the data header of xxx block, working when bursttd is running.
- packet number:
- clock counter: clock frequency is not locked to 400 MHz. Rebooting FPGA hardly helps.
- plausible solution: check cable connection to 10 MHz distribution, or replace 10 MHz distributor by one without built-in oscillator.
- PPS counter:
- packet order: normally it should be {4,5,6,7} for 16-bit data or {2,3,2,3} for 4-bit data. seems only reboot the FPGA can solve by chance
ssh [email protected].[]
sudo rebootLong power cycle (power off for >10 min ) seems working.
Possible causes:
- 10 MHz distribution amplifier with built-in oscillator? No.
- replaced by 2* 2-way splitter. Problem persists
- bad coaxial cable connection of 10MHz Ref? No.
- changed coaxial cable. Problem persists
- program?
- FPGA #3 (both 232 & 220) has problem
- single FPGA 220 worked normally at lab
- synchronized at 1st reload of bitcode after powered on
- out of sync after 2nd reload of FPGA bitcode
- FPGA #3 (both 232 & 220) has problem
- specs of LMK04828B highest performance clock conditioner : 0.25-2.4 Vpp 10MHz input. There is a 3 dB attenuator on the path, meaning 0.35-3.39 Vpp is acceptable.
cannot read packet counters from /dev/hugepages/fpga*
$ disp_header.py -meta 128 /dev/hugepages/fpga* file: /dev/hugepages/fpga0 invalid header encountered None invalid header encountered None invalid header encountered None invalid header encountered None file: /dev/hugepages/fpga1 (1036800000, 530841600398, 1712821196, 1327, 4) (1036800001, 530841600910, 1712821196, 1327, 5) (1036800002, 530841601422, 1712821196, 1327, 6) (1036800003, 530841601934, 1712821196, 1327, 7)
This is likely because bursttd is not running.