FPGA Emulation - StanfordVLSI/dragonphy2 GitHub Wiki

This page is about emulating the DragonPHY design on an FPGA. The first section consists of instructions to replicate the emulator that is run as part of the regression testing setup. Subsequent sections are about writing your own tests and possible areas for future work.

Instructions

  1. Install Vivado if you haven't already. (Instructions here)
  2. Switch to the master branch of DragonPHY (if you aren't on it already), and make sure it is up-to-date.
  3. Set the environment variable TAP_CORE_LOC to point to the EDIF file (not folder) for the TAP core. Some information on how to generate that file is in experiments/tap_export/README.md; it should only be necessary to do that once. Please be aware of one limitation of this flow: unlike in the real ASIC, the emulator JTAG ID will not match the git hash. Instead, it will always be 0x1DA8C133.
> export TAP_CORE_LOC=PATH_TO_TAP_EDIF
  1. Set the environment variable DW_TAP to point to /cad/synopsys/syn/L-2016.03-SP5-5/dw/sim_ver/DW_tap.v (you may need to make a local copy of that file). This is only needed for running a sanity check simulation (test_2); it is not used for building the emulator bitstream. As before, the environment variable should point to the file, not folder.
> export DW_TAP=PATH_TO_DW_TAP
  1. Make sure that your emulation dependencies are up-to-date (run commands below in the top-level of the DragonPHY repository). There have been recent updates to all three packages, so previously installed versions are likely out-of-date.
> pip uninstall svreal msdsl anasymod
> pip install -e .
  1. Build FPGA models:
> python make.py --view fpga
  1. Change directory to folder that corresponds to the desired emulator architecture:
    • tests/fpga_system_tests/emu: low-level modeling strategy. Uses the same analog_core as the real ASIC. Can achieve 5 Mb/s emulation rate.
    • tests/fpga_system_tests/emu_macro: high-level modeling strategy. Uses a synthesizable macro-model for analog_core. Can achieve 80 Mb/s emulation rate.
  2. Build the project configuration for the emulator. Valid options for BOARD_NAME include ZCU106, ZC706, ZC702, and PYNQ_Z1. For EMU_CLK_FREQ, 15e6 is recommended for the low-level modeling strategy while 30e6 is recommended for the high-level strategy. The emulation throughput is directly proportional to this value.
> pytest -s -k test_1 --board_name BOARD_NAME --emu_clk_freq EMU_CLK_FREQ
  1. Before building the FPGA bitstream, check the emulator architecture with a simulation. This takes 5 minutes for the low-level architecture, or 1-2 minutes for the high-level architecture.
> pytest -s -k test_2
  1. If that looks good, build the FPGA bitstream. This normally takes about 30 minutes for the low-level architecture or 45 minutes for the high-level architecture.
> pytest -s -k test_3
  1. After that completes, you may want to open the project in Vivado. This is not required, but is sometimes useful to get a sense of how things went. To do that, launch Vivado and open the project file build/fpga/prj/prj.xpr. If you open the implemented design, you can report timing (to make sure there are no timing violations) and report utilization (to make sure the resource utilization looks as expected).
> vivado &
  1. At this point it is time to plug in the FPGA board:
    1. Connect power to the FPGA board (board-dependent; check the user manual for the board if you're not sure). There is often a power switch that has to be flipped, too.
    2. Connect USB JTAG and USB UART to the host computer. For ZCU106, the connector locations are shown here.
  2. Now program the FPGA board and run the emulation:
> pytest -s -k test_4
  1. If that looks good, there are various additional arguments that can be passed to test_4:
    1. --prbs_test_dur: Duration of the PRBS test, in seconds. Default value is 10 seconds, which is 50 Mb for the low-level architecture, or 800 Mb for the high-level architecture.
    2. --jitter_rms: RMS jitter of the ADC sampling times, in seconds. Default value is 0; our design can tolerate up to about 2.6e-12 with other settings at their defaults.
    3. --noise_rms: RMS noise added to voltages sampled by ADC, in volts. Default value is 0; our design can tolerate up to about 56e-3 with other settings at their defaults.
    4. --chan_tau: Time constant of the channel, assuming a simple first-order exponent step response. Default value is 25e-12; our design can tolerate up to about 217e-12 with other settings at their defaults. The emulator can be configured at runtime with a non-exponential step response, but this currently has to be done in Python, rather than through the command line.
    5. --chan_delay: Time delay of the channel (i.e., time when the channel step response becomes non-zero). Default value is 31.25e-12, which is 0.5 UI. If you decrease chan_delay towards 0, the PI control codes should decrease; if you increase it towards 62.5e-12, then the codes should increase.
  2. Sometimes it is useful to look at waveforms captured from the FPGA's Integrated Logic Analyzer (ILA). To do that, launch Vivado, open build/fpga/prj/prj.xpr, and then open the Hardware Manager. Connect to the FPGA but do not reprogram it, since that will restart the emulator. You can then select signals for probing and set triggering options using the ILA window.

Writing your own tests

If you want to change JTAG reads/writes, then you can copy the test_4 function definition from test*.py into your own file, and edit from there. You'll notice that the interaction between the emulator and host computer takes place by sending ASCII commands over the USB UART link, mostly to read and write JTAG registers as in CPU-based tests. In writing your own tests, you don't necessarily have to use pytest unless you want your test to be used in the regression suite.

One of the other things you might want to do in emulation is to set the channel dynamics to something other than an exponential step response. As a first pass, you can try editing the definition of chan_func, which is just a regular Python function. You can see that the coefficients needed to represent chan_func are computed after that point and uploaded to the emulator. If you want to experiment further with the channel dynamics, you may need to update some of the parameters in config/fpga/chan.yml (low-level architecture) or config/fpga/analog_slice_cfg.yml (high-level architecture). The properties func_* refer to the representation of the channel step response:

  1. func_order: 0 means piecewise-constant, 1 means piecewise-linear, 2 means piecewise-quadratic, etc.
  2. func_numel: number of piecewise polynomial segments in the function
  3. func_domain: domain of the step response function. If you need a longer step response, you can increase the second number (but may want to increase func_numel in order to keep the step size constant)
  4. func_widths: Widths of the coefficients used in lookup tables for piecewise-polynomial coefficients. The first entry is for the offset, the second entry is for the slope, etc.
  5. func_exps: Similar to func_widths, but for the exponents of fixed-point formats. The resolution of the kth coefficient is 2**func_exps[k], while its range is:
[-(2**(func_widths[k]-1))*(2**func_exps[k]), (2**(func_widths[k]-1)-1)*(2**func_exps[k])]

Creating your own test stimulus in Python is likely sufficient for many emulation use cases, but there may be times where you want to change something in the emulator hardware or firmware. To do that, it's recommended to copy the emu or emu_macro folder as a starting point (depending on whether you want to use the low-level or high-level architecture). You can then make these kinds of changes:

  1. Be able to probe different signals using the ILA: edit simctrl.yaml in the digital_probes section. This will require you to rebuild the emulator starting from step 3 (i.e., test_3, then test_4).
  2. Add functions to the ARM core firmware: edit main.c. You'll see that this code consists of the UART interpreter that the Python code on the host computer is interacting with. You might want to add some higher-level features to the firmware that can be invoked over UART as a method for speeding up emulation. This requires rebuilding the emulator from step 4.
  3. Add more signals that can be read/written by the ARM core: edit simctrl.yaml in the digital_ctrl_inputs or digital_ctrl_outputs sections. This requires rebuilding the emulator starting from step 3. If you want to test out the changes with sanity check simulation (test_2), then you'll need to wire up the additional control signals in sim_ctrl.sv.

Ideas for future work

  1. There is a bunch of duplicated code in test_emu.py and test_emu_macro.py related to UART commands sent to the ARM core. This code should probably be pulled into a common controller.