Verilator Simulations - l-nic/chipyard GitHub Wiki

Verilator is a cycle-accurate, software Verilog simulator. Relative to Firesim, it runs much slower but it is significantly easier to debug because it is able to produce full waveforms.

This page describes the Verilator simulation framework and how to use it to reproduce our results.

Verilator Simulation Framework

The figure below depicts the Verilator simulation framework that we set up.

images/Verilator-Sims.png

The simulated nanoPU runs in a Verilator process that is able to execute about 1K cycles every second. A timestamp module inserts timestamps into packets as they arrive at and depart from the nanoPU; we use these timestamps to measure the nanoPU's wire-to-wire latency and throughput. The Verilator process is attached to a Linux TAP interface which functions as a virtual network for our simulations.

We use a Python unit testing framework which we call net-app-tester.py to generate test messages and process the nanoPU's response messages by reading and writing to the Linux Tap interface. It uses Python Scapy for performing network IO. The Python test program will log statistics such as latency or throughput and write the results to a CSV file.

The general simulation flow is as follows:

Start the Python net-app-tester, which will wait for an initial "boot complete" message sent by the nanoPU after it has booted and it ready to start processing packets.
Start the Verilator simulation.
Wait for the Python test to complete.
Terminate the Verilator simulation.

The simulation results logged by the Python test program will be written to the results/ directory and the waveforms produced by the Verilator simulations will be a .vcd file in the current directory. You can view the waveforms using gtkwave.

We have created a simple program called run_verilator_sims.py which uses the Python subprocess module to manage both the Python unit testing process and the Verilator process. This script makes it easier for the user to run the simulations.

Description of Simulations

Loopback Latency / Throughput

This simulation measures the wire-to-wire latency and throughput of the nanoPU across various message sizes. We run the lnic-loopback-latency.c program on the simulated nanoPU when measuring latency and the lnic-loopback-throughput.c program when measuring throughput. Each of these programs simply retransmit any received messages as fast as possible without doing any useful work. The main difference between the two programs is how message timestamps are copied into the transmitted messages. These timestamps are used to measure latency or throughput.

The corresponding programs for the traditional (IceNIC) system are called icenic-loopback-latency.c and icenic-loopback-throughput-batch.c.

The net-app-tester contains a unit test class called LoopbackTest, which is used to generate the test messages, process the responses, and log the measurements.

Loopback-with-Increment Throughput

This simulation is very similar to the loopback test, however, this time the application increments each 8B word of the message by 1 before transmitting it back to the sender. The RISC-V program that we run on the nanoPU is called lnic-stream-throughput.c.

The corresponding RISC-V program for the traditional (IceNIC) system is called icenic-stream-batch.c.

The net-app-tester unit test class is called StreamTest.

Dot Product Microbenchmark

In this simulation, the program running on the nanoPU / traditional system computes the dot product between an in-memory vector and a vector that is contained within the arriving message, then generates a response message that contains the result. We test both a naive and optimal implementation of this program on the nanoPU. The naive implementation is called lnic-dot-product-naive.c and the optimal version is called lnic-dot-product-opt.c.

The corresponding traditional implementation of this microbenchmark is called icenic-dot-product-batch.c.

The net-app-tester's DotProdTest unit test class generates the test messages, checks the response message, and logs the system throughput.

Running the Simulations

Setup

First ssh into your AWS manager instance if you have not already done so:

ssh -i firesim.pem -L 8888:localhost:8888 centos@YOUR_INSTANCE_IP

Note that we are forwarding port 8888 over the SSH connection so that we can access a Jupyter notebook server that we will run later.

Set up the environment:

$ cd ~/chipyard/sims/firesim
$ source sourceme-f1-manager.sh

The RISC-V test programs should already be compiled on the provided instance, but to double check run the following commands:

$ cd ~/chipyard/tests-lnic
$ make
$ cd ~/chipyard/tests-icenic
$ make

The nanoPU (i.e., L-NIC) and Traditional (i.e., IceNIC) Verilator simulation models should also already be compiled on the provided instance, so you should NOT need to run the commands below, unless you decide to make changes to the Chisel source code. Each model will take about 15 minutes to compile.

$ cd ~/chipyard/sims/verilator
$ make debug CONFIG=LNICSimNetworkQuadRocketConfig -j16
$ make debug CONFIG=IceNICSimNetworkRocketConfig TOP=TopIceNIC MODEL=TestHarnessIceNIC VLOG_MODEL=TestHarnessIceNIC -j16

Test

To make sure that everything is setup correctly, run a simple test that sends one packet into each simulation model (running a loopback application) and check to make sure a response is received.

$ cd ~/chipyard/software/net-app-tester/
$ sudo bash
# ./run_verilator_sims.py --test

The test should complete within a couple of minutes with no errors. Occasionally, we've seen the following error: ioctl: device or resource busy, usually the first time the simulation is run. If you see this error, try re-running the simulation.

Run

Run all the simulations (about 10 minutes):

$ cd ~/chipyard/software/net-app-tester/
$ sudo bash
# ./run_verilator_sims.py --all

After it completes, the results are logged as CSV files in the results/ directory.

Note that you can also run each simulation separately if you prefer:

# ./run_verilator_sims.py --fig3
# ./run_verilator_sims.py --fig4
# ./run_verilator_sims.py --fig5
# ./run_verilator_sims.py --fig6

Be sure to exit the sudo bash shell before proceeding to the next section.

Analyze the Results

We will use a Jupyter notebook to view the simulation results. Jupyter notebooks, plus matplotlib, plus pandas is a great way of interacting with and visualizing data. Run the following command to start the jupyter notebook server. This command will run in the foreground so we recommend running it in a separate tab:

$ cd ~
$ jupyter notebook

When you start the jupyter notebook it should provide you with a link that looks like:http://localhost:8888/tree?token=<TOKEN-NUMBER>. Visit this link using your local web browser. Note that we are able to do this because we are forwarding port 8888 over the ssh connection to the manager.

Open the Verilator-Evals.ipynb notebook located in the chipyard/software/net-app-tester/ directory. Select Kernel > Restart & Run All. At this point, you should be able to view all the plots created from the Verilator simulation results and they should closely match the plots presented in the paper.