Scenario Demo - nps-ros2/ns3_gazebo GitHub Wiki

We use lessons learned from the ROS2 ns3 Gazebo Demos 1-3 to build this demo.

Here we demonstrate defining an ad-hoc Wifi communication using a CSV file and testing network flows using ns-3 and a test launcher script.

Here are the sections of this page:

Setup - Set up the environment to run the demos.
Demo 1: Proof of concept. Here we show how we put together the ns-3 network simulator program, the testbed GUI showing performance, and the program that starts the robots.
Demo 2: Scenario builder. Here we implement a parser for building the scenario from a CSV file.
Demo 3: C++. Due to bottleneck performance, we reimplement robot code in C++ and observe the same bottleneck.
Demo 4: Simtime mode. We test using simulation time instead of realtime, but DDS runs in realtime when managing DDS reliable protocol.
Demo 5: Latency. We perform more experiments to evaluate measured latency.

Setup

Setup for this demo includes the following installation:

The ROS2 environment, see https://github.com/nps-ros2/ns3_gazebo/wiki/Installing-the-ROS2-Environment
The ns-3 (ns-3.29) Network Simulator, see https://github.com/nps-ros2/ns3_gazebo/wiki/Installing-ns-3
Setup for Network Namespaces and network devices, see https://github.com/nps-ros2/ns3_gazebo/wiki/Installing-Network-Namespaces-and-Network-Topology
Building the ns3 program and the ROS2 nodes
Setting up global variables
Defining your robot scenario

Once setup is complete we run the talker-listener demo between separate network namespaces and between Gazebo and the host.

Install the ROS2 environment

Install the ROS2 environment, see https://github.com/nps-ros2/nps-ros2-examples/wiki/Installing-the-ROS2-Environment

Install ns-3

Install the ns-3 network simulator, see https://github.com/nps-ros2/ns3_gazebo/wiki/Installing-ns-3.

Install Network Namespaces and Network Topology

Install network namespaces and their network devices as described at https://github.com/nps-ros2/ns3_gazebo/wiki/Installing-Network-Namespaces-and-Network-Topology. In this example, use --count 20 to support 1 Ground Station and 29 Robots. Specifically:

cd ~/gits/ns3_gazebo/scripts
sudo ./nns_setup.py setup -c 20

Build the ns-3 Mobility Program

Compile the ns3_mobility program that will provide a stationary antenna for nns1 and moving antenna locations for nodes nns2 and above:

cd ~/gits/ns3_gazebo/ns3_testbed/ns3_mobility
mkdir build
cd build
cmake ..
make

Build ROS2 Nodes

Build the ROS2 GS and robot testbed nodes:

cd ~/gits/ns3_gazebo/ns3_testbed/ns3_testbed_nodes
colcon build

Set Global Variables for ROS2 Nodes

Add this to .initrc or run it directly:

source ~/gits/ns3_gazebo/ns3_testbed/ns3_testbed_nodes/install/local_setup.bash

Set Global Variables for ns-3

Define global variables so the ns3_mobility program can find ns-3 components when it runs. Add this to your .initrc file:

# ns-3 compatibility
export LD_LIBRARY_PATH=~/repos/ns-3-allinone/ns-3.29/build/lib:$LD_LIBRARY_PATH
export PATH=$HOME/repos/ns-3-allinone/ns-3.29/build/src/fd-net-device:$HOME/repos/ns-3-allinone/ns-3.29/build/src/tap-bridge:$PATH

Define your Robot Scenario

Define your robot communication and QoS setup as described in https://github.com/nps-ros2/ns3_gazebo/wiki/Defining-your-Robot-Scenario.

Demo 1 proof-of-concept

The ns3 testbed demo consists of these parts, each run in a separate command window:

The ns-3 network simulator program supporting mobility and Wifi for 20 nodes.
The testbed GUI showing communication latency and loss.
The Ground Station and the robots, communicating as defined by settings in your spreadsheet.

Start the ns-3 Program:

cd ~/gits/ns3_gazebo/ns3_testbed/ns3_mobility/build
./ns3_mobility -c 20

Start the Testbed GUI:

cd ~/gits/ns3_gazebo/ns3_testbed/ns3_testbed_gui
./tg.py

Start the GS and Swarm Robots

A root shell is required to start robot nodes within their own network namespaces.
Use your own CSV swarm setup file or use the default CSV setup file at ~/gits/ns3_gazebo/ns3_testbed/csv_setup/example1.csv.

Start the root shell and the robots (R1, R2, R3, etc. depending on count) using the testbed_runner program:

sudo /bin/bash
cd ~/gits/ns3_gazebo/ns3_testbed/cpp_testbed_runner/build/cpp_testbed_runner
./testbed_runner.py -n -p -c 20

testbed_runner.py supports several options:

-c The count of ROS2 nodes to start.
-s The CSV scenario setup file.
-n Run the nodes in network namespaces instead of the system network space.
-p Send received subscription metadata to the pipe that the GUI is listening to.
-v Verbose, print out received subscription metadata and some other diagnostics.

Demo 2: Scenario builder

We run multiple nodes, for example one GS and nine robots, each in their own network namespace.
My 4-CPU system does not keep up as manifested by CPU workload and pauses in ns-3 output.

Input Data

Robot data and QoS settings are configured for each ROS2 node. This example configures 39 robots:

example1

Traffic Flow Monitor

Received packet traffic is refreshed once per second. Index is the message number for the message. Size is the size of the message received, in bytes. Latency is the delay in microseconds. Nine robots are monitored:

ten_table

CPU Workload

This 4-CPU system does not always keep up, seen by one CPU always working at 100%:

ten_workload

ns-3 Positions

The ten GS and robot x y z positions are refreshed at 10Hz. GS is stationary. Some Robots move out of range. This is seen in the output of the ns3 mobility program:

ten_ns3

Comments

Uses Linux pipe to flow data from GS to pipe.
Uses custom codec to encode/decode data through pipe.
GUI consumes data from pipe once per second.
If a row in the GUI is blank, we zero the row.
Latency is in microseconds and is calculated as current_time - data_timestamp.

Demo 3: C++ Implementation

To ensure the lack of scalability is not a result of the Python implementation, we port the robot code from Python to C++. Results shown indicate that there is still a bottleneck on one CPU.

Setup

Setup is similar to setup for part 1 except for the part about ROS2 Nodes.

Build

Instead of building Python nodes at ns3_testbed_nodes we build C++ nodes at cpp_testbed_runner:

cd ~/gits/ns3_gazebo/ns3_testbed/cpp_testbed_runner
colcon build

Set Global Variables

Add this to .initrc or run it directly:

source ~/gits/ns3_gazebo/ns3_testbed/cpp_testbed_runner/install/local_setup.bash

Start the C++ Robots

Launch the robots using the C++ testbed_runner program from the directory where Colcon built it, specifying to use pipes and network namespaces and to start 30 of them:

sudo /bin/bash
cd ~/gits/ns3_gazebo/ns3_testbed/cpp_testbed_runner/build/cpp_testbed_runner
./testbed_runner -p -n -c 30

Results

My 4-CPU system still does not keep up. Ten nodes:

CPU Workload

System monitor:

ten_c_system_monitor

htop:

ten_c_htop

Demo 4: Simtime Mode

We may be able to simulate the steady-state portion using ns-3 in simulation time mode instead of real-time mode.

We still require network namespaces because DDS must run outside ns-3 (ns-3 does not simulate DDS). DDS may use wall time for watchdog and discovery so we cannot properly simulate DDS timing unless we port DDS to use ns-3 simulation time instead of wall time.

The approach is to export ns-3 simulation time into the robots. Robots act on this time, not on wall time:

Robots transmit at intervals based on simulation time.
Robots receive data and timestamp receipt based on simulation time.

We would use memory mapped IO (MMIO) for interprocess communication. ns-3 writes simulation time to MMIO memory on each update of its simulation timer. Robots read this MMIO memory to obtain simulation time instead of wall time.

Here is a brief diagram of this approach:

ns3_sim_time

Note that DDS will still use wall time, so DDS will incorrectly retransmit packets if wall time times out and simulation time would have not timed out. We could port DDS to read simulation time instead of wall time so DDS can be modeled accurately, too.

Goals

We do not need to measure DDS configuration time. We can measure performance after network discovery has been set up.

Design

ns-3 design

Copy ~/repos/ns-3-allinone to ~/repos/ns-3-custom
Use class shared_simetime_t with interfaces uint64_t t() and set_t(uint64_t) allowing interfaces so processes can set or read simulation time via sizeof(uint64_t) bytes of shared memory named /testbed_shared_simtime. ns-3 will write time in default units of nanoseconds. Robots will read. See ns3_testbed2/ns3_simtime_support for class, header, test program, and changes to ns-3.29.
Change file ns-3.29/src/core/model/realtime-simulator-impl.cc and .h so the scheduler also calls set_t(t) when it updates its simulation time. To simplify code management, we put this class right into these files. You may copy these changed files from the repository:
```
cd ~/repos/ns-3-custom/ns-3.29/src/core/model
cp ~/gits/ns3_gazebo/ns3_testbed2/ns3_simtime_support/changed-ns-3.29-files/realtime-simulator-impl.cc .
cp ~/gits/ns3_gazebo/ns3_testbed2/ns3_simtime_support/changed-ns-3.29-files/realtime-simulator-impl.h .
```
Rebuild ns-3, ref. https://github.com/nps-ros2/ns3_gazebo/wiki/Installing-ns-3:
```
cd ~/repos/ns-3-custom/ns-3.29
./waf configure --enable-sudo
./waf build
```

Change .bashrc so ns-3 accesses ns-3-custom:

export LD_LIBRARY_PATH=~/repos/ns-3-custom/ns-3.29/build/lib:$LD_LIBRARY_PATH
export PATH=$HOME/repos/ns-3-custom/ns-3.29/build/src/fd-net-device:$HOME/repos/ns-3-allinone/ns-3.29/build/src/tap-bridge:$PATH

Question: Will simulation time update when there is no network traffic?

Robot design

Change robot to read custom timestamp instead of chrono. Specifically, replace function _now() with shared_simtime_t::t(). Units are in nanoseconds. This change is available in ns3_testbed_simtime/cpp_testbed_runner. Build it:

cd ~/gits/ns3_gazebo/ns3_testbed_simtime/cpp_testbed_runner
colcon build

Run

Start ns-3

Start ns-3, keep range length short

cd ~/gits/ns3_gazebo/ns3_testbed/ns3_mobility/build
./ns3_mobility -c 5 -l 2

Start the Testbed GUI

cd ~/gits/ns3_gazebo/ns3_testbed/ns3_testbed_gui
./tg.py

Start Wireshark

sudo wireshark

Define your Configuration

This configuration sets GS R1 to listen and for R2..R30 to transmit 500 bytes "odometry" at 1 Hz. Note that we do not use all 30 robots, the number is set in the ns-3 ns3_mobility program.

Publish,,,,,,,
Node,Subscription,Frequency,Size,History,Depth,Reliability,Durability
R2-30,odometry,1,500,keep_last,0,reliable,volatile
Subscribe,,,,,,,
Node,Subscription,History,Depth,Reliability,Durability,,
R1,odometry,keep_last,0,reliable,volatile,,
R1,image,keep_last,0,reliable,volatile,,

Start the Robots

sudo /bin/bash
cd ~/gits/ns3_gazebo/ns3_testbed_simtime/cpp_testbed_runner/build/cpp_testbed_runner
./testbed_runner -p -n -c 10

Results

Upper Limit

6 robots (5 publishers and 1 subscriber) transmitting 500 bytes and 2500 bytes at 10 Hz.
10 robots (9 publishers and 1 subscriber) transmitting 500 bytes at 1 Hz.

Latency

Here is latency for ten robots, specifically, 9 publishers and 1 subscriber, 500 bytes odometry at 1Hz, using simtime timestamps:

ten_timing.pdf

Here is latency using walltime instead of simtime timestamps:

ten_timing_walltime.pdf

Notes

Our simulation time implementation may not be necessary. We can watch the latency in the GUI. Shortly after all robots have registered, latency will either continue to increase or will slowly decrease to a fraction of a second.
There is significantly more latency jitter when running ns-3 and robots on simtime rather than running using walltime. Which is correct?
All messages from all robots are sent at once. It would be more realistic to stagger transmission times between robots. All at once is worst-case.

Demo 5: Latency

Latency in part 4 seems high. This page examines latency with five and then with two nodes, distance between robots is 0 to 2 meters.

Rather than testing with 1 transmitter and 4 receivers, I used 4 transmitters and 1 receiver to fit the data capture code. The first results are for 5 robots, the third result is for 2 robots.

5 Robots

4 transmit 1 receive

All Transmit at the Same Time.

4 transmit 1 receive, transmitting at the same time averages 1.5 and more milliseconds latency:

ten_5_4_1.pdf

Staggered Transmission

4 transmit 1 receive, staggered starting times averages about 1.5ms latency:

ten_5_4_1_staggered.pdf

2 Robots

1 transmit 1 receive

1 transmit 1 receive exhibited intermittent results. Sometimes delay was 0.7 ms, sometimes 1.0 ms:

ten_2_1_1.pdf

Conclusions

The latency we measured may be appropriate.
The inconsistency in delay in the 2 Robots scenario may be troubling
We can try more experiments: take larger samples and test with all swarm sizes from 2 to 10.

Extended run

Here is latency over 15 minutes for R1-R2 communication. Note popular latency regions and periodicity:

large_dataset.csv.pdf

Here is latency using the original setup:

z_10hz.csv.pdf