Fushan start procedure after shutdown - IAA-BURSTT/document GitHub Wiki

We will use this page to archive the start procedure after a system shutdown. Older procedures will be put at the bottom of the page, while newer updates will be kept at the top.


bf256 procedure as of 2024-Nov-01

Start DHCP server (dnsmasq)

Currently, burstt2 serves as the DHCP server. However, the dnsmasq service does not work properly after the server boots up. Use the following commands to check the status and restart the service:

sudo systemctl status dnsmasq
sudo systemctl restart dnsmasq

When the service is successfully activated, the FPGAs will acquire the preconfigured IPs through this DHCP. The DHCP leases are saved in /var/lib/dnsmasq/dnsmasq.leases

Reload bitcode

Again, burstt2 is the main FPGA control server at the moment. The following FPGA IPs should be ready to proceed:

223 224 222 229 252 250 244 247 107 105 110 248 111 108 109 104

(corresponding to row01, row02, ..., row16)

(ping 192.168.40.xxx, replace xxx with any of the above numbers, to check if they are reachable)

Terminal 1 (linux shell):

[method 1]

Reload all FPGAs. Set the PLL (--setclock) once, after the FPGAs are powered on.

rfsoc
./reload_fpga.py --bc bf256 -B board_bf256_v2 -b FUS_eigen_241010 --beam0 -7.5 223 224 222 229 252 250 244 247 107 105 110 248 111 108 109 104 --setclock

The command above includes loading the bf256 bitcode (fpga_configs/fushan_sw/common_bf256.config), setting the destinations according to 'fpga_configs/fushan_sw/board_bf256_v2', calibrating with 'BFM/FUS_eigen_241010' and applying the 1st beamform with 'beam0=-7.5'. The '--setclock' is needed after the FPGAs have been power-cycled.

[method 2]

Reload individual FPGAs (e.g. fpga223):

rfsoc
./zcu216_100g_config.py -c fpga_configs/general_test.config 192.168.40.223 --setclock

The command above loads a generic testing configuration (using 16-bit bitcode) to the given IP (of the FPGA) and sets the clock PLL. Please note that the generic config sets dest_ip=10.17.16.32. The corresponding MAC address (0x88e9a4971b9e) was set in another file 'config_arp/config.yaml.test.241016'. If you plan to receive 100G packets, then the MAC address and the dest_ip should be updated to reflect the actual setup.

Terminal 2 (python shell):

Start the interactive python

rfsoc
cd config_arp
python -i zcu_newcontrol.py

Within the interactive python:

f256 = connect(pre['fushan'])
fpgaIO('r', f256, 'clk_frequency', show_key=True)
resetepoch(f256)
fpgaIO('r', f256, 'pps_count', show_key=True)

Start receiving programs (rudp256)

Repeat the same procedure for each of {burstt1, burstt2, burstt3, burstt4}.

Start a new terminal. SSH into bursttX if the terminal is not on bursttX.

Terminal 1 (command):

cd rudp256
sudo ./submit256.sh

The current setting in this script is packets_per_block = 204800 --> frames_per_block = 51200 --> seconds_per_block = 0.131072. nBlock = 160. nSum = 400.

Start a new terminal. SSH into bursttX if the terminal is not on bursttX.

Terminal 2 (monitor):

cd rudp256
./readshm

The current script starts a 20sec ring buffer, which is about 270GB per node. So it may take about 30sec before the ring buffers are ready and the monitor starts to show any information.

Please note that, at this point, the 2nd beamform matrix is just an identity matrix. That is, only the 1st beamform is performed. We do not know the FPGA delays and could not start the 2nd beamform yet.

Optional

To make analysis easier, we should cross-mount the data disks on all servers. Simply run this command:

sudo /opt/burstt/bin/cross_mount_burstt.sh

(this command should also be done on burstt6, the bonsai server, as well.)

Repeat the abover procedure for each of {burstt1, burstt2, burstt3, burstt4}.

Prepare for calibration

We need to save some full-baseband data when the Sun is available to calibrate the FPGA delays. One thing to note is that the 1st beamform is in effect. Therefore it is more straightforward to analyze if we record when the Sun is in one of the 1st beams. The calibration procedure can be done from one server alone. Here I chose burstt4.

First, check the timing of each beam given the current beam0 setting (for 1st beamform). Start a new terminal and enable the (bursttda) enrionment:

Terminal 1

bda
cd /data/kylin/beam_times
check_beam_times.py sun 241102_0200 241102_0600 -b -7.5

An example diagram is shown below. The Sun will enter the earlier beam (beam-15) at 10:20 am local time. fushan6 UT241102_0200 sun angles

Edit an observing script for saving the baseband data across all four servers and save it as 'at_241102_bb256.sh':

# remember to source this script in (kylin) or (bursttda) environment
# with delay correction in 2nd beamform

odate=110224

##-- only needed on burstt4 --
## baseband data commanded from burstt4 with multi_dump
## every 30min for 10h
echo '~/rudp256/multi_dump.py 36000 -w 1800 -h 192.168.241.151 -h 192.168.241.152 -h 192.168.241.153' | at 00:00 $odate
## take 0.5 hours of data, every 1min
echo '~/rudp256/multi_dump.py 1800 -w 60 -h 192.168.241.151 -h 192.168.241.152 -h 192.168.241.153' | at 10:10 $odate
##-- only needed on burstt4 --

Then execute the script:

source at_241102_bb256.sh

After the data is taken (i.e. after 2024-11-02 10:20:00, which is our target observation), check if the data do exist (assuming the disks have been cross-mounted in the previous steps):

ls /burstt?/disk?/data/ring?.20241102102000.bin

If they do, then run the following command to derive the FPGA delays:

second_cal256.py /burstt?/disk?/data/ring?.20241102102000.bin

Some outputs are saved in 'cal_20241102102000.check', including the delay info and a phase diagram:

(bursttda) [ubuntu@burstt4 2nd_cal]$ cat cal_20241102102000.check/ant_delay_correct.txt 
# delay correction needed are (ns):
--ds '-0.000 -0.732 2.603 1.687 1.207 -10.761 1.228 0.482 -0.913 0.377 0.494 -1.168 2.335 -0.663 0.028 -0.701'

ant_phase_correct

Apply the 2nd beamform and start the intensity server

Repeat the same procedure for each of {burstt1, burstt2, burstt3, burstt4}.

Go to the (command) terminal we used to run the 'submit256.sh' script.

Execute the following command:

./write_2nd_matrix_256_log.py --auto --ds '-0.000 -0.732 2.603 1.687 1.207 -10.761 1.228 0.482 -0.913 0.377 0.494 -1.168 2.335 -0.663 0.028 -0.701'

This will automatically identify the packet order received in the current server and apply the corresponding beamform matrix to both nodes, using the derived FPGA delays (specified in the command line).

To start the intensity socket server, it is recommended that we use the 'screen' command to keep the diagnostic information. For each node, we can use:

screen -S order0
python3 intensity_socket_server256.py 0 1
(press) ctrl-a (press) d
screen -S order1
python3 intensity_socket_server256.py 1 1
(press) ctrl-a (press) d

The two integers following the 'intensity_socket_server256.py' are node_id (0 or 1) and the channelization factor (1 or 16). For 1k channels, we use '1', and for 16k channels, we use '16'. The "(press) ctrl-a (press) d" part will 'detach' the screen and return you back to the original shell.

To restore the screen, use the following command:

screen -r order0
(press) ctrl-a (press) d

Repeat the abover procedure for each of {burstt1, burstt2, burstt3, burstt4}.