release2 04‐Demo - Smart-Edge-Lab/SeQaM GitHub Wiki

SeQaM Demo

The platform allows to experiment on how different conditions in the edge environment affect the response time (end to end latency) of a user application.

1. An example App

In order to be aware of how environment conditions affect an application latencies, we need to instrument the app using Open Telemetry.

As an example, let's instrument a real-time video processing app. This app implements one of the autonomous driving tasks recognizing traffic signs.

The below picture illustrates this application running.

2. Instrument the App

The app consist of a client and a server parts.

In the below code snippet you can see a span hierarchy for the client part. The 'e2e-latency' span comprises four sub-spans '1-client-preprocessing', '2-client-send-data', '3-client-receive-data' and '4-client-render-result':

def run_client():
    while True:
        with tracer.start_as_current_span('e2e-latency'):
            tracer_context = get_current_tracer_context()
            with tracer.start_as_current_span('1-client-preprocessing'):
                raw_frame = get_frame_from_camera()
                preprocessed_frame = preprocess_frame(raw_frame)
            with tracer.start_as_current_span('2-client-send-data'):
                response = send_frame_to_server(preprocessed_frame, tracer_context)
            with tracer.start_as_current_span('3-client-receive-data'):
                detected_traffic_signs = get_detected_traffic_signs(response)
            with tracer.start_as_current_span('4-client-render-result'):
                render(raw_frame, detected_traffic_signs)

The 'e2e-latency' span lasts the whole time that was required to get a frame from a camera, prepare it for sending over network ('1-client-preprocessing'), the time required to deliver the frame with the tracer context to the server part, and the time to detect the traffic signs by the server part ('2-client-send-data' with additional spans produced by the server within the transferred tracer context), time to deserialize that detected signs by the client part ('3-client-receive-data'), and, finally, time to render the frame with detected traffic signs on the user display ('4-client-render-result').

The server part is implemented as a stateless application that processes incoming HTTP POST requests from clients and instrumented as follows:

@app.post('/detect-traffic-signs')
def detect_traffic_signs(request: Request) -> DetectedTrafficSigns:
    frame: Frame = request.frame
    tracer_context: Context = request.context
    with tracer.start_as_current_span("server-main", context=tracer_context):
        with tracer.start_as_current_span('server-read-data'):
            processed_frame = process_frame(frame)
        with tracer.start_as_current_span('3-server-processing'):
            return infer_traffic_signs(processed_frame)

The client part is running all the time until user decides to stop it and continuously asks the server part to process frames from the camera. It produces a big amount of the above named spans with different durations.

3. Scenario Config

After you installed our platform with the api/bin/install.sh script, you can find a ScenarioConfig.json file in the configuration dot-folder .seqam_fh_dortmund_project_emulate/ under your user home directory. The ScenarioConfig.json file describes the device types and names as well as types and ports of the components that they shall run. For example the following file

{
  "distributed": {
    "ue": [
      {
        "name": "client",
        "component_type": "distributed_event_manager",
        "port": 9011
      },
      {
        "name": "load-client",
        "component_type": "network_event_manager",
        "port": 9012
      }
    ],
    "server": [
      {
        "name": "server1",
        "component_type": "distributed_event_manager",
        "port": 9001
      },
      {
        "name": "server2",
        "component_type": "distributed_event_manager",
        "port": 9002
      },
      {
        "name": "load-server",
        "component_type": "network_event_manager",
        "port": 9003
      }
    ],
    "router": [
      {
        "name": "router1",
        "description": "A Cisco router",
        "host": "172.22.228.111",
        "vendor": "cisco",
        "sampling_interval": 1000,
        "emulate": true
      },
      {
        "name": "demo",
        "description": "Another vendor router",
        "host": "10.0.0.1",
        "vendor": "mikrotik",
        "emulate": true
      }
    ]
  }
}

states that two user-equipment (ue) devices named "client" and "load-client" shall run "distributed_event_manager" and "network_event_manager" components, respectively; two server devices named "server1", "server2" shall run "distributed_event_manager" component and one server device "load-server" shall run another "network_event_manager" instance; and, finally, there are two router devices "router1" and "demo" that will be observed using the SNMP protocol. The "sampling_interval" parameter specifies the router sampling rate in milliseconds. If no "sampling_interval" provided, then the default of 5000 milliseconds is applied.

The IP addresses of all the devices, except of routers, will be resolved automatically while the components are starting up.

The platform allows us to control all the declared devices and to observe their status using so-called commands.

4. Starting the platform and deploying the components

Let's open the env file on the configuration dot-folder ~/.seqam_fh_dortmund_project_emulate and update the SEQAM_CENTRAL_HOST variable with the domain name of the machine Demo-Client, where we are going to deploy the central component, and update the MECS_WARE_5G_URL, MECS_WARE_5G_LOGIN, MECS_WARE_5G_PASSWORD variables with the 5G controlling software credentials.

Then we build the platform components executing the following command on the platform source-code folder:

./bare-composes/generate-docker-composes.sh y

where the y parameter means that we need to build tarballs ready to be copied to the remote devices without requirement to rebuild them again and again on every device.

After execution, the script creates, under the bare-composes/ folder, the below subfolders:

seqam-central
seqam-distributed-event-manager
seqam-network-event-manager
seqam-net-spy
seqam-docker-compose-tree

The seqam-central folder, contains the central platform component tarball, docker-compose files, and a copy of the configuration dot-folder ~/.seqam_fh_dortmund_project_emulate; the seqam-distributed-event-manager, seqam-network-event-manager, and seqam-net-spy contain the components tarballs. Additionally, all these folders contain a little load-image.sh script facilitating the deployment of tarballs on docker.

The seqam-docker-compose-tree folder is special, because it reflects the structure of the ScenarioConfig.json file and contains the docker compose files for all the declared components:

├── seqam-docker-compose-tree
│   ├── server
│   │   ├── load-server
│   │   │   └── docker-compose.yaml
│   │   ├── server1
│   │   │   └── docker-compose.yaml
│   │   └── server2
│   │       └── docker-compose.yaml
│   └── ue
│       ├── client
│       │   └── docker-compose.yaml
│       └── load-client
│           └── docker-compose.yaml

We ensure that some recent Signoz version with the applied apply-me-on-new-signoz.diff is running on the machine where we are going to deploy the central component:

ssh Demo-Client

docker ps --format {{.Names}}
signoz-otel-collector
signoz
signoz-clickhouse
signoz-zookeeper-1

We copy the content of the seqam-central folder to that machine:

scp -r seqam-central/ Demo-Client:

ssh to the machine, deploy the central component tarball and start it:

cd seqam-central/

./load-image.sh seqam.central.tar.gz

docker compose up -d

Now we can point our web browser to http://demo-client:8000/ and see a so-called platform chat window. We can type "hello" in the prompt and check the response:

SEQAM Chat v0.27.1-20250415 ;)
API docs


4/17/2025, 2:19:55 PM <<< hello
4/17/2025, 2:19:55 PM http://seqam-command-translator:8001/translate/ >>> <Response [200]>
4/17/2025, 2:19:55 PM EventOrchestratorModuleREST >>> Guten Tag!
4/17/2025, 2:19:55 PM http://seqam-event-orchestrator:8002/event/ >>> <Response [200]>


________________________________ Send

You can click on API docs link on the upper left corner under the "SEQAM Chat v0.27.1-20250415 ;)" caption. Here you can try to execute the GET /config/ScenarioConfig.json API call. It shall output you something like the following:

{
  "distributed": {
    "ue": [
      {
        "name": "client",
        "component_type": "distributed_event_manager",
        "port": 9011
      },
      {
        "name": "load-client",
        "component_type": "network_event_manager",
        "port": 9012
      }
    ],
    "server": [
      {
        "name": "server1",
        "component_type": "distributed_event_manager",
        "port": 9001
      },
      {
        "name": "server2",
        "component_type": "distributed_event_manager",
        "port": 9002
      },
      {
        "name": "load-server",
        "component_type": "network_event_manager",
        "port": 9003
      }
    ],
    "router": [
      {
        "name": "router1",
        "description": "A Cisco router",
        "host": "172.22.228.111",
        "vendor": "cisco",
        "sampling_interval": 1000,
        "emulate": true,
        "interface_oids": {
          "bytes_recv": "1.3.6.1.2.1.2.2.1.10.",
          "bytes_sent": "1.3.6.1.2.1.2.2.1.16.",
          "bandwidth": "1.3.6.1.2.1.2.2.1.5.",
          "packets_recv": "1.3.6.1.2.1.2.2.1.11.",
          "packets_sent": "1.3.6.1.2.1.2.2.1.17."
        },
        "extends": "common",
        "cpus_oid": "1.3.6.1.4.1.9.9.109.1.1.1.1.5",
        "memory_oids": {
          "available": "1.3.6.1.4.1.9.9.48.1.1.1.5.1",
          "used": "1.3.6.1.4.1.9.9.48.1.1.1.6.1"
        },
        "base_oids": {
          "processor_memory_pull": {
            "free": "1.3.6.1.4.1.9.9.48.1.1.1.5.1",
            "used": "1.3.6.1.4.1.9.9.48.1.1.1.6.1"
          },
          "io_memory_pull": {
            "free": "1.3.6.1.4.1.9.9.48.1.1.1.5.2",
            "used": "1.3.6.1.4.1.9.9.48.1.1.1.6.2"
          },
          "cpu": {
            "last_5_seconds": "1.3.6.1.4.1.9.2.1.56.0"
          }
        }
      },
      {
        "name": "demo",
        "description": "Another vendor router",
        "host": "10.0.0.1",
        "vendor": "mikrotik",
        "emulate": true,
        "interface_oids": {
          "bytes_recv": "1.3.6.1.2.1.2.2.1.10.",
          "bytes_sent": "1.3.6.1.2.1.2.2.1.16.",
          "bandwidth": "1.3.6.1.2.1.2.2.1.5."
        },
        "extends": "common",
        "cpus_oid": "1.3.6.1.2.1.25.3.3.1.2",
        "memory_oids": {
          "used": "1.3.6.1.2.1.25.2.3.1.6.65536",
          "total": "1.3.6.1.2.1.25.2.3.1.5.65536"
        },
        "base_oids": {
          "cpu": {
            "frequency": "1.3.6.1.4.1.14988.1.1.3.14.0"
          }
        }
      }
    ]
  }
}

You see that the settings under the "router" section are, in comparison with the initial ScenarioConfig.json file, extended with the oids for proper router vendors.

Let's deploy "server1" and "load-client" on the same machine coppying the content of the seqam-docker-compose-tree/server/server1, seqam-docker-compose-tree/ue/load-client, seqam-distributed-event-manager and seqam-network-event-manager there:

scp -r seqam-docker-compose-tree/server/server1 seqam-docker-compose-tree/ue/load-client seqam-distributed-event-manager seqam-network-event-manager Demo-Client:

ssh to the machine, deploy the tarballs and run them:

cd seqam-distributed-event-manager/
./load-image.sh seqam-distributed-event-manager.tar.gz 

cd ../seqam-network-event-manager/
./load-image.sh seqam-network-event-manager.tar.gz 

cd ../server1/
docker compose up -d

cd ../load-client/
docker compose up -d

Now, let's hit the GET /config/ScenarioConfig.json API again. We see that server1 under the server section and load-client under the ue section are automatically extended with resolved machine IP address and the component API routes:

{
  "distributed": {
    "ue": [
      {
        "name": "client",
        "component_type": "distributed_event_manager",
        "port": 9011
      },
      {
        "name": "load-client",
        "component_type": "network_event_manager",
        "port": 9012,
        "description": "load-client",
        "host": "172.30.0.1",
        "paths": [
          {
            "network_load": {
              "endpoint": "/event/network/load"
            }
          }
        ]
      }
    ],
    "server": [
      {
        "name": "server1",
        "component_type": "distributed_event_manager",
        "port": 9001,
        "description": "server1",
        "host": "172.22.174.144",
        "paths": [
          {
            "event": {
              "endpoint": "/event/"
            },
            "cpu_load": {
              "endpoint": "/event/stress/cpu_load"
            },
            "memory_load": {
              "endpoint": "/event/stress/memory_load"
            },
            "watch_device": {
              "endpoint": "/event/watch"
            }
          }
        ]
      },
      {
        "name": "server2",
        "component_type": "distributed_event_manager",
        "port": 9002
      },
      {
        "name": "load-server",
        "component_type": "network_event_manager",
        "port": 9003
      }
    ],
    "router": [
        ...
    ]
  }
}

Using the GET /servers API call, we can see the devices that submit their metrics towards the platform:

[
  "server1.server"
]

Let's check the latest metrics on the "server1" device using the GET /metrics API call, setting host parameter to "server1.server" and limit parameter to "1":

[
  {
    "host": "server1.server",
    "time": 1745310305.2320833,
    "cpu_state": {
      "cpu": {
        "cpu_load": [
          {
            "core": 0,
            "percentage": 1.154690865375918
          },
          {
            "core": 1,
            "percentage": 4.150003288754844
          },
          {
            "core": 2,
            "percentage": 3.151565752304053
          },
          {
            "core": 3,
            "percentage": 2.1531284018267205
          },
          {
            "core": 4,
            "percentage": 12.137503022440843
          },
          {
            "core": 5,
            "percentage": 8.143753155597844
          },
          {
            "core": 6,
            "percentage": 0.1562534219118561
          },
          {
            "core": 7,
            "percentage": 1.1546909583626475
          },
          {
            "core": 8,
            "percentage": 0.1562534219118561
          },
          {
            "core": 9,
            "percentage": 5.148440732218917
          },
          {
            "core": 10,
            "percentage": 17.12969033274789
          },
          {
            "core": 11,
            "percentage": 3.151565752304053
          },
          {
            "core": 12,
            "percentage": 3.151565752304053
          },
          {
            "core": 13,
            "percentage": 3.1515658452907824
          },
          {
            "core": 14,
            "percentage": 1.1546909583626475
          },
          {
            "core": 15,
            "percentage": 1.154690865375918
          },
          {
            "core": 16,
            "percentage": 1.154690865375918
          },
          {
            "core": 17,
            "percentage": 1.154690865375918
          },
          {
            "core": 18,
            "percentage": 5.148440732218917
          },
          {
            "core": 19,
            "percentage": 3.1515658452907824
          },
          {
            "core": 20,
            "percentage": 6.146878175682979
          },
          {
            "core": 21,
            "percentage": 0.1562534219118561
          },
          {
            "core": 22,
            "percentage": 6.146878175682979
          },
          {
            "core": 23,
            "percentage": 1.154690865375918
          }
        ],
        "t0": 1745310304.2305183,
        "t0_datetime": "2025-04-22T08:25:04.230518",
        "t1": 1745310305.2320833,
        "t1_datetime": "2025-04-22T08:25:05.232083",
        "t1_t0": 1.0015649795532227
      },
      "core_times": [
        {
          "idle": 8177791.84
        },
        {
          "idle": 8185296.13
        },
        {
          "idle": 8187008.98
        },
        {
          "idle": 8187666.6
        },
        {
          "idle": 8187574.52
        },
        {
          "idle": 8187819.54
        },
        {
          "idle": 8188892.78
        },
        {
          "idle": 8188368.06
        },
        {
          "idle": 8189801.18
        },
        {
          "idle": 8189787.98
        },
        {
          "idle": 8189749.19
        },
        {
          "idle": 8188952.53
        },
        {
          "idle": 8190525.99
        },
        {
          "idle": 8191822.37
        },
        {
          "idle": 8192427.52
        },
        {
          "idle": 8192425.49
        },
        {
          "idle": 8192903.23
        },
        {
          "idle": 8192862.74
        },
        {
          "idle": 8192958.13
        },
        {
          "idle": 8192465.5
        },
        {
          "idle": 8192170.16
        },
        {
          "idle": 8192541.44
        },
        {
          "idle": 8192655.08
        },
        {
          "idle": 8193354.75
        }
      ],
      "core_temperatures": [],
      "pressure_some": {
        "avg10": 0,
        "avg60": 0,
        "avg300": 0,
        "total": 9542365994
      },
      "pressure_full": {
        "avg10": 0,
        "avg60": 0,
        "avg300": 0,
        "total": 0
      }
    },
    "memory_state": {
      "total": 24598319104,
      "available": 20181188608,
      "pressure_some": {
        "avg10": 0,
        "avg60": 0,
        "avg300": 0,
        "total": 403704227
      },
      "pressure_full": {
        "avg10": 0,
        "avg60": 0,
        "avg300": 0,
        "total": 402950013
      }
    },
    "io_state": {
      "nic": "all",
      "bytes_sent": 1272593733759,
      "bytes_recv": 2327620710309,
      "packets_sent": 1031459376,
      "packets_recv": 1551386874,
      "pressure_some": {
        "avg10": 0.07,
        "avg60": 0.21,
        "avg300": 0.32,
        "total": 18363992158
      },
      "pressure_full": {
        "avg10": 0.07,
        "avg60": 0.21,
        "avg300": 0.32,
        "total": 18091391884
      }
    },
    "net_state": [
      {
        "nic": "lo",
        "bytes_sent": 300134629,
        "bytes_recv": 300134629,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 4095035,
        "packets_recv": 4095035
      },
      {
        "nic": "ens18",
        "bytes_sent": 38828273251,
        "bytes_recv": 1075684703243,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 329.4843637076909,
        "packets_sent": 27282053,
        "packets_recv": 126869696
      },
      {
        "nic": "docker0",
        "bytes_sent": 264553346,
        "bytes_recv": 914032,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 16539,
        "packets_recv": 10888
      },
      {
        "nic": "br-a196da213c81",
        "bytes_sent": 0,
        "bytes_recv": 0,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 0,
        "packets_recv": 0
      },
      {
        "nic": "br-6a79a1a1cef5",
        "bytes_sent": 40854481964,
        "bytes_recv": 580990046970,
        "bytes_sent_per_second": 96356.20451011552,
        "bytes_recv_per_second": 1330236.2075342524,
        "packets_sent": 228530977,
        "packets_recv": 506840024
      },
      {
        "nic": "veth2caf87e",
        "bytes_sent": 63096910,
        "bytes_recv": 122090954,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 531652,
        "packets_recv": 277840
      },
      {
        "nic": "br-e0632b092d71",
        "bytes_sent": 575350720643,
        "bytes_recv": 39442687444,
        "bytes_sent_per_second": 1318987.811044765,
        "bytes_recv_per_second": 91233.22187319382,
        "packets_sent": 270223122,
        "packets_recv": 202178095
      },
      {
        "nic": "veth0a0c455",
        "bytes_sent": 53132,
        "bytes_recv": 0,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 553,
        "packets_recv": 0
      },
      {
        "nic": "vethc744fda",
        "bytes_sent": 55962,
        "bytes_recv": 3307,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 585,
        "packets_recv": 34
      },
      {
        "nic": "veth83e9e48",
        "bytes_sent": 56744,
        "bytes_recv": 4758,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 594,
        "packets_recv": 50
      },
      {
        "nic": "veth9a0709b",
        "bytes_sent": 13064103,
        "bytes_recv": 2063614,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 11678,
        "packets_recv": 12605
      },
      {
        "nic": "vetha8989f7",
        "bytes_sent": 41246820666,
        "bytes_recv": 588427836427,
        "bytes_sent_per_second": 96356.20451011552,
        "bytes_recv_per_second": 1346716.4163444317,
        "packets_sent": 229476550,
        "packets_recv": 507931263
      },
      {
        "nic": "veth5a5c94d",
        "bytes_sent": 5102001,
        "bytes_recv": 4797202,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 80660,
        "packets_recv": 79792
      },
      {
        "nic": "veth9df885b",
        "bytes_sent": 329502690,
        "bytes_recv": 374305522,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 996297,
        "packets_recv": 926008
      },
      {
        "nic": "veth12f98e1",
        "bytes_sent": 575337732575,
        "bytes_recv": 42271120421,
        "bytes_sent_per_second": 1318987.811044765,
        "bytes_recv_per_second": 97844.87477159483,
        "packets_sent": 270212255,
        "packets_recv": 202165526
      },
      {
        "nic": "br-dcc1e4961589",
        "bytes_sent": 30551,
        "bytes_recv": 830,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 282,
        "packets_recv": 9
      },
      {
        "nic": "veth09d2a84",
        "bytes_sent": 54592,
        "bytes_recv": 956,
        "bytes_sent_per_second": 0,
        "bytes_recv_per_second": 0,
        "packets_sent": 544,
        "packets_recv": 9
      }
    ]
  }
]

From the "cpu_load" field, we can see that the device CPU is more or less idle now. You can see cpu preassure, memory and networking measures here as well.

5. Control the platform with "commands"

Now, let's return to the "Chat" window in the web browser and type the "help" command. It will output you a list of possible "commands":

There are the following commands available: help, hello, ssh, start_module, exit, stop_module, cpu_load, memory_load, network_bandwidth, network_load, migrate, connect, disconnect, watch_device, nop. You can find detailed info about them typing help <command>

Let's try to apply some load on the CPU of the "server1" device. For this purpose, let's check the detailed documentaion on the cpu_load command, typing help cpu_load in the "Chat" window:

cpu_load [comment:str] src_device_type:ue|server|router|5g_ue src_device_name:str time:str load:str [mode:stat|rand|inc|dec] [random_seed:int] [load_min:int] [load_max:int] [load_step:int] [time_step:int] cores:int

All parameters in the squared brekets are optional. Let's apply a 50% load on all the CPU cores of the "server1" by hitting the following command in the "Chat":

cpu_load src_device_type:server src_device_name:server1 time:1m load:50 cores:0

where time:1m means that the load will last for one minute; cores:0 means that the load will be applied on all the CPU cores.

While the command is being executed, let's check the output of the GET /metrics API call:

[
  {
    "host": "server1.server",
    "time": 1745311383.4414678,
    "cpu_state": {
      "cpu": {
        "cpu_load": [
          {
            "core": 0,
            "percentage": 53.09985399783833
          },
          {
            "core": 1,
            "percentage": 66.07223487022137
          },
          {
            "core": 2,
            "percentage": 57.09135587621341
          },
          {
            "core": 3,
            "percentage": 54.097729513899296
          },
          {
            "core": 4,
            "percentage": 60.08498214559312
          },
          {
            "core": 5,
            "percentage": 54.097729513899296
          },
          {
            "core": 6,
            "percentage": 56.09348036015245
          },
          {
            "core": 7,
            "percentage": 62.08073308478067
          },
          {
            "core": 8,
            "percentage": 57.09135587621341
          },
          {
            "core": 9,
            "percentage": 58.089231299339986
          },
          {
            "core": 10,
            "percentage": 55.095604937025875
          },
          {
            "core": 11,
            "percentage": 52.10197857471176
          },
          {
            "core": 12,
            "percentage": 52.10197857471176
          },
          {
            "core": 13,
            "percentage": 53.09985409077272
          },
          {
            "core": 14,
            "percentage": 53.09985399783833
          },
          {
            "core": 15,
            "percentage": 58.089231206405586
          },
          {
            "core": 16,
            "percentage": 49.10835230533203
          },
          {
            "core": 17,
            "percentage": 32.14446964750832
          },
          {
            "core": 18,
            "percentage": 54.097729513899296
          },
          {
            "core": 19,
            "percentage": 53.09985409077272
          },
          {
            "core": 20,
            "percentage": 53.09985399783833
          },
          {
            "core": 21,
            "percentage": 55.09560502996027
          },
          {
            "core": 22,
            "percentage": 56.09348036015245
          },
          {
            "core": 23,
            "percentage": 60.08498214559312
          }
        ],
        "t0": 1745311382.4393387,
        "t0_datetime": "2025-04-22T08:43:02.439339",
        "t1": 1745311383.4414678,
        "t1_datetime": "2025-04-22T08:43:03.441468",
        "t1_t0": 1.002129077911377
      },
      ...
]

You see, that the CPU load on all the cores is increased to average of 50% in comparison with the "idle" state that we observed before.

Now let's deploy the "server2", "load-server" and "net-spy" components on the Demo-Server machine:

cd bare-composes/

scp -r seqam-distributed-event-manager/ seqam-network-event-manager/ seqam-docker-compose-tree/server/server2/ seqam-docker-compose-tree/server/load-server/ seqam-net-spy/ Demo-Server:

ssh Demo-Server

cd seqam-distributed-event-manager/
./load-image.sh seqam-distributed-event-manager.tar.gz

cd ../seqam-network-event-manager/
./load-image.sh seqam-network-event-manager.tar.gz

cd ../seqam-net-spy/
./load-image.sh seqam-net-spy.tar.gz

cd ../server2/
docker compose up -d

cd ../load-server/
docker compose up -d

cd ../seqam-net-spy/
docker compose up -d

Please note, that the server-part of the example application is also already deployed on the same Demo-Server machine.

Let's check the output of the GET /servers API:

[
  "server1.server",
  "server2.server"
]

We see that now it outputs the two devices.

The change is also visible in the GET /config/ScenarioConfig.json API:

{
  "distributed": {
    ...
    "server": [
      {
        "name": "server1",
        "component_type": "distributed_event_manager",
        "port": 9001,
        "description": "server1",
        "host": "172.22.174.144",
        "paths": [
          {
            "event": {
              "endpoint": "/event/"
            },
            "cpu_load": {
              "endpoint": "/event/stress/cpu_load"
            },
            "memory_load": {
              "endpoint": "/event/stress/memory_load"
            },
            "watch_device": {
              "endpoint": "/event/watch"
            }
          }
        ]
      },
      {
        "name": "server2",
        "component_type": "distributed_event_manager",
        "port": 9002,
        "description": "server2",
        "host": "172.22.174.142",
        "paths": [
          {
            "event": {
              "endpoint": "/event/"
            },
            "cpu_load": {
              "endpoint": "/event/stress/cpu_load"
            },
            "memory_load": {
              "endpoint": "/event/stress/memory_load"
            },
            "watch_device": {
              "endpoint": "/event/watch"
            }
          }
        ]
      },
      {
        "name": "load-server",
        "component_type": "network_event_manager",
        "port": 9003,
        "description": "load-server",
        "host": "172.22.174.142",
        "paths": [
          {
            "network_load": {
              "endpoint": "/event/network/load"
            }
          }
        ]
      }
    ],
    ...
  }
}

where the "server2" and "load-server" are now enriched with their appropriate IP addresses and URI paths.

Now, let's put a network load between the load-client and load-server devices. For this purpose, we ask for a help in the "Chat" window:

help network_load

that outputs

network_load [comment:str] src_device_type:ue|server|router|5g_ue src_device_name:str time:str load:str [mode:stat|rand|inc|dec] [random_seed:int] [load_min:int] [load_max:int] [load_step:int] [time_step:int] [interface:str] [dst_device_type:ue|server|router|5g_ue] [dst_device_name:str]

The load value is in Megabits and we should specify the source and destination for the network traffic stress to be applied:

network_load src_device_type:ue src_device_name:load-client time:1m load:500 dst_device_type:server dst_device_name:load-server

While the above command is being executed, let's get the "server1" metrics using the GET /metrics API and compare the "bytes_sent_per_second" network readings for the ens18 interface in the "idle" and "stressed" states:

IDLE:
{
  "nic": "ens18",
  "bytes_sent": 38828273251,
  "bytes_recv": 1075684703243,
  "bytes_sent_per_second": 0,
  "bytes_recv_per_second": 329.4843637076909,
  "packets_sent": 27282053,
  "packets_recv": 126869696
}

UNDER STRESS:
{
  "nic": "ens18",
  "bytes_sent": 39083440208,
  "bytes_recv": 1075702505853,
  "bytes_sent_per_second": 62583735.639007784,
  "bytes_recv_per_second": 95971.2765071737,
  "packets_sent": 27297522,
  "packets_recv": 126909867
}

It means that a lot of data (a bit more than 500 Mbits) is being sent every second from the Demo-Client machine where the load-client and server1 are deployed.

Let's check the router metrics with the GET /routers/{host}/metrics API call, setting the host parameter to "router1" and the from_cache parameter to true:

{
  "time": 1745316706.7120934,
  "interfaces": [
    {
      "nic": "eth0",
      "bytes_sent": 42353596,
      "bytes_recv": 4232390,
      "bytes_sent_per_second": 99.78308149883595,
      "bytes_recv_per_second": 9.978308149883595,
      "packets_sent": 2867,
      "packets_recv": 6752,
      "bandwidth": 1000000000,
      "up_link_utilization": 0.00007982646519906876,
      "dn_link_utilization": 0.000007982646519906876
    },
    {
      "nic": "eth1",
      "bytes_sent": 42353593,
      "bytes_recv": 4232406,
      "bytes_sent_per_second": 99.78308149883595,
      "bytes_recv_per_second": 9.978308149883595,
      "packets_sent": 7245,
      "packets_recv": 590,
      "bandwidth": 1000000000,
      "up_link_utilization": 0.00007982646519906876,
      "dn_link_utilization": 0.000007982646519906876
    },
    {
      "nic": "eth2",
      "bytes_sent": 42353580,
      "bytes_recv": 4232392,
      "bytes_sent_per_second": 99.78308149883595,
      "bytes_recv_per_second": 9.978308149883595,
      "packets_sent": 2767,
      "packets_recv": 4290,
      "bandwidth": 1000000000,
      "up_link_utilization": 0.00007982646519906876,
      "dn_link_utilization": 0.000007982646519906876
    }
  ],
  "cpu": [
    {
      "core": 1,
      "percentage": 29
    },
    {
      "core": 2,
      "percentage": 15
    },
    {
      "core": 3,
      "percentage": 56
    },
    {
      "core": 4,
      "percentage": 85
    }
  ],
  "memory": {
    "total": 8415,
    "available": 3965
  },
  "base_metrics": {
    "processor_memory_pull": {
      "free": "3847",
      "used": "5859"
    },
    "io_memory_pull": {
      "free": "4884",
      "used": "1937"
    },
    "cpu": {
      "last_5_seconds": "399"
    }
  }
}

We can get several last readings for particular router interface using the GET /routers/{router_name}/load API call. So we set router_name to router1, nic to eth1 and limit to 2 and receive:

[
  {
    "host": "router/router1",
    "time": "2025-04-22T10:24:21",
    "nic": "eth1",
    "bytes_sent": 99,
    "bytes_recv": 9,
    "cum_bytes_sent": 42428913,
    "cum_bytes_recv": 4239932,
    "time_diff": 1.0014314651489258
  },
  {
    "host": "router/router1",
    "time": "2025-04-22T10:24:20",
    "nic": "eth1",
    "bytes_sent": 99,
    "bytes_recv": 9,
    "cum_bytes_sent": 42428813,
    "cum_bytes_recv": 4239922,
    "time_diff": 1.0012950897216797
  }
]

You see that time_diff is approximately one second as was configured in the "sampling_interval": 1000, on the ScenarioConfig.json for the router router1.

The same call for the router demo outputs:

[
  {
    "host": "router/demo",
    "time": "2025-04-22T10:28:41",
    "nic": "eth1",
    "bytes_sent": 99,
    "bytes_recv": 9,
    "cum_bytes_sent": 42505907,
    "cum_bytes_recv": 4244662,
    "time_diff": 5.006640911102295
  },
  {
    "host": "router/demo",
    "time": "2025-04-22T10:28:36",
    "nic": "eth1",
    "bytes_sent": 99,
    "bytes_recv": 9,
    "cum_bytes_sent": 42505407,
    "cum_bytes_recv": 4244612,
    "time_diff": 5.000458717346191
  }
]

where time_diff is approximately five seconds, which is the default value for the sampling_interval.

We can change the sampling interval with the watch_device command. The help watch_device outputs the following manual:

watch_device [comment:str] src_device_type:ue|server|router|5g_ue src_device_name:str interval:int [metrics:str]

So we change the sampling interval for router demo to, say, three seconds:

watch_device src_device_type:router src_device_name:demo interval:3000

After the command was executed, we can get the collected router load samples with the GET /routers/{router_name}/load call again and ensure that the sampling interval really changed:

[
  {
    "host": "router/demo",
    "time": "2025-04-22T11:56:49",
    "nic": "eth1",
    "bytes_sent": 99,
    "bytes_recv": 9,
    "cum_bytes_sent": 43034328,
    "cum_bytes_recv": 4297412,
    "time_diff": 3.0013105869293213
  },
  {
    "host": "router/demo",
    "time": "2025-04-22T11:56:46",
    "nic": "eth1",
    "bytes_sent": 99,
    "bytes_recv": 9,
    "cum_bytes_sent": 43034028,
    "cum_bytes_recv": 4297382,
    "time_diff": 3.0004513263702393
  }
]

We can use the same watch_device command to update the sampling interval for another devices, like servers (src_device_type:server), user equipment (src_device_type:ue), or 5G user equipment devices (src_device_type:5g_ue).

6. Use an ExperimentConfig.json

User can automate triggering of all the commands using the ExperimentConfig.json file. An example content of the file is as follows:

{
  "experiment_name": "Short Integration Test",
  "eventList": [
    {
      "command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:3000 metrics:status",
      "executionTime": 0
    },
    {
      "command": "watch_device src_device_type:server src_device_name:server2 interval:100",
      "executionTime": 0
    },
    {
      "command": "watch_device src_device_type:ue src_device_name:client interval:3000",
      "executionTime": 0
    },
    {
      "command": "nop time:10s comment:'Collect 5G UE status every 3 seconds and server status every 1/10 second'",
      "executionTime": 0
    },
    {
      "command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:8000",
      "executionTime": 10000
    },
    {
      "command": "watch_device src_device_type:server src_device_name:server2 interval:500",
      "executionTime": 10000
    },
    {
      "command": "cpu_load src_device_type:server src_device_name:server2 cores:0 load:50 time:20s comment:'Collect 5G UE status and RTT every 8 seconds, server status every 1/2 second'",
      "executionTime": 10000
    },
    {
      "command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:10000",
      "executionTime": 30000
    },
    {
      "command": "watch_device src_device_type:server src_device_name:server2 interval:1000",
      "executionTime": 30000
    },
    {
      "command": "memory_load src_device_type:server src_device_name:server2 workers:5 load:20m time:10s comment:'Collect 5G UE status and RTT every 10 seconds, and server status every second'",
      "executionTime": 30000
    },
    {
      "command": "network_load src_device_type:ue src_device_name:load-client dst_device_type:server dst_device_name:load-server load:500 time:20s comment:'Collect 5G UE status and RTT every 10 seconds, and server status every second'",
      "executionTime": 30000
    },
    {
      "command": "nop time:10s comment:'Collect 5G UE status and RTT every 10 seconds, and server status every second'",
      "executionTime": 50000
    },
    {
      "command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:100000",
      "executionTime": 60000
    },
    {
      "command": "watch_device src_device_type:server src_device_name:server2 interval:100000",
      "executionTime": 60000
    },
    {
      "command": "watch_device src_device_type:ue src_device_name:client interval:100000",
      "executionTime": 60000
    },
    {
      "command": "exit",
      "executionTime": 60001
    }
  ]
}

where experiment_name is the name of the experiment that will appear in the list of conducted experiments; eventList is the sequence of commands that will be applied during the experiment at time executionTime since the start of the experiment.

The first command

{
  "command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:3000 metrics:status",
  "executionTime": 0
}

means that we are watching the status (metrics:status) of a 5G user equipment device with name MECS-IPC every three seconds (interval:3000) since the very begin of the experiment ("executionTime": 0).

The second command

{
  "command": "watch_device src_device_type:server src_device_name:server2 interval:100",
  "executionTime": 0
}

means that simultaneously with the first watch ("executionTime": 0), we are watching the server2 device every 1/10 second interval:100.

The nop command is used to label a span of time with a human-readable comment:

{
  "command": "nop time:10s comment:'Collect 5G UE status every 3 seconds and server status every 1/10 second'",
  "executionTime": 0
}

Actually, any command, including the stress one, can be enrichen with a comment, for example:

{
  "command": "cpu_load src_device_type:server src_device_name:server2 cores:0 load:50 time:20s comment:'Collect 5G UE status and RTT every 8 seconds, server status every 1/2 second'",
  "executionTime": 10000
}

Let's start the client part of our example app together with the client ue component on a laptop. The client ue component is getting registered in the platform API and is enriched with the IP address and URI paths in the output of GET /config/ScenarioConfig.json:

{
  "distributed": {
    "ue": [
      {
        "name": "client",
        "component_type": "distributed_event_manager",
        "port": 9011,
        "description": "client",
        "host": "172.22.229.149",
        "paths": [
          {
            "event": {
              "endpoint": "/event/"
            },
            "cpu_load": {
              "endpoint": "/event/stress/cpu_load"
            },
            "memory_load": {
              "endpoint": "/event/stress/memory_load"
            },
            "watch_device": {
              "endpoint": "/event/watch"
            }
          }
        ]
      },
      ...

Now begin the experiment hitting the following command in the "Chat" window:

start_module module:experiment_dispatcher

this command produces a lot of output in the "Chat" window. Wait a minute until you find a message

4/22/2025, 3:02:02 PM ExperimentDispatcher >>> Experiment Short Integration Test-22.04.2025T13:00:59 is finished

indicating that the experiment is finished.

7. Investigating the experiment results

Now let's open the API page again and execute the GET /experiments API call. This call outputs the list of conducted experiments, that consist of only one experiment so far:

[
  "Short Integration Test-22.04.2025T13:00:59"
]

Also, we need to check the list of running instrumented apps using the GET /apps API call. It outputs the following:

[
  "demo-dnn-partitioning",
  "experiment_dispatcher_fh_dortmund_project_emulate"
]

where demo-dnn-partitioning is our instrumented example app; experiment_dispatcher_fh_dortmund_project_emulate is a platform component conducting the experiments itself.

Let's execute another API call GET /experiments/{exp_name}/apps/{app_name}, substituting the exp_name with "Short Integration Test-22.04.2025T13:00:59" and app_name with demo-dnn-partitioning. It outputs experiment statistics grouped by so-called experiment segments

An experiment segment is a portion of time during what one or more stress or nop commands are happening in parallel.

For example, the first segment consist of only one command and looks like:

{
  "commands": [
      "nop comment:Collect 5G UE status every 3 seconds and server status every 1/10 second time:10s"
  ],
  "indexes": [
      0
  ],
  "start_time": 1745407481748,
  "end_time": 1745407491748,
  "span_statistics": [
      {
          "name": "e2e-latency",
          "min_duration": 97.0,
          "avg_duration": 196.0,
          "max_duration": 261.0,
          "count": 51
      },
      {
          "name": "4-client-render-result",
          "min_duration": 3.0,
          "avg_duration": 5.0,
          "max_duration": 12.0,
          "count": 51
      },
      {
          "name": "1-client-preprocessing",
          "min_duration": 45.0,
          "avg_duration": 140.0,
          "max_duration": 207.0,
          "count": 51
      },
      {
          "name": "3-client-receive-data",
          "min_duration": 0.0,
          "avg_duration": 0.0,
          "max_duration": 0.0,
          "count": 51
      },
      {
          "name": "2-client-send-data",
          "min_duration": 45.0,
          "avg_duration": 50.0,
          "max_duration": 56.0,
          "count": 50
      }
  ],
  "metrics": [...],
  "router_metrics": [...],
  "net_spy_metrics": [...],
  "five_g_ue_status": [...],
  "five_g_ue_rtt": [...],
  "start_time_human_readable": "2025-04-23T11:24:41.748000Z",
  "end_time_human_readable": "2025-04-23T11:24:51.748000Z",
  "duration": 10000
}

The segment lasts for 10000 milliseconds (see "duration": 10000 field) or 10 seconds as expected. During this segment the platform aggregated the span statistics in the "span_statistics" field; collected CPU, memory and network metrics in the "metrics" field; router metrics in the "router_metrics" field; link utilization metrics in the "net_spy_metrics" field; 5G user equipment status and round trip time in the "five_g_ue_status" and "five_g_ue_rtt" fields.

We can ensure that during that segment we really collect CPU, memory and network metrics every 1/10 second expanding the "metrics" field and examining the "t1_t0" value:

{
  "commands": [
      "nop comment:Collect 5G UE status every 3 seconds and server status every 1/10 second time:10s"
  ],
  "indexes": [...],
  "start_time": 1745407481748,
  "end_time": 1745407491748,
  "span_statistics": [...],
  "metrics": [
      {
          "host": "server2.server",
          "time": 1745407482.2817087,
          "cpu_state": {
              "cpu": {
                  "cpu_load": [...],
                  "t0": 1745407482.18046,
                  "t0_datetime": "2025-04-23T11:24:42.180460",
                  "t1": 1745407482.2817087,
                  "t1_datetime": "2025-04-23T11:24:42.281709",
                  "t1_t0": 0.10124874114990234

The second twenty seconds long segment is as follows:

{
    "commands": [
        "cpu_load comment:Collect 5G UE status and RTT every 8 seconds, server status every 1/2 second src_device_type:server src_device_name:server2 time:20s load:50 cores:0"
    ],
    "start_time": 1745407491875,
    "end_time": 1745407511743,
    "span_statistics": [
        {
            "name": "e2e-latency",
            "min_duration": 102.0,
            "avg_duration": 207.0,
            "max_duration": 282.0,
            "count": 95
        },
        {
            "name": "4-client-render-result",
            "min_duration": 3.0,
            "avg_duration": 5.0,
            "max_duration": 8.0,
            "count": 95
        },
        {
            "name": "1-client-preprocessing",
            "min_duration": 38.0,
            "avg_duration": 135.0,
            "max_duration": 206.0,
            "count": 95
        },
        {
            "name": "3-client-receive-data",
            "min_duration": 0.0,
            "avg_duration": 0.0,
            "max_duration": 0.0,
            "count": 95
        },
        {
            "name": "2-client-send-data",
            "min_duration": 52.0,
            "avg_duration": 65.0,
            "max_duration": 80.0,
            "count": 96
        }
    ],
    "metrics": [...],
    "router_metrics": [...],
    "net_spy_metrics": [...],
    "five_g_ue_status": [...],
    "five_g_ue_rtt": [...],
    "start_time_human_readable": "2025-04-23T11:24:51.875000Z",
    "end_time_human_readable": "2025-04-23T11:25:11.743000Z",
    "duration": 19868
}

We can see that the CPU load affected the latency a little bit.

The third ten seconds segment comprises both network and memory load simultaneously:

{
    "commands": [
        "network_load comment:Collect 5G UE status and RTT every 10 seconds, and server status every second src_device_type:ue src_device_name:load-client time:20s load:500 dst_device_type:server dst_device_name:load-server",
        "memory_load comment:Collect 5G UE status and RTT every 10 seconds, and server status every second src_device_type:server src_device_name:server2 time:10s load:20m workers:5"
    ],
    "start_time": 1745407511938,
    "end_time": 1745407521786,
    "span_statistics": [
        {
            "name": "e2e-latency",
            "min_duration": 173.0,
            "avg_duration": 199.0,
            "max_duration": 252.0,
            "count": 49
        },
        {
            "name": "4-client-render-result",
            "min_duration": 4.0,
            "avg_duration": 6.0,
            "max_duration": 10.0,
            "count": 49
        },
        {
            "name": "1-client-preprocessing",
            "min_duration": 107.0,
            "avg_duration": 138.0,
            "max_duration": 195.0,
            "count": 49
        },
        {
            "name": "3-client-receive-data",
            "min_duration": 0.0,
            "avg_duration": 0.0,
            "max_duration": 0.0,
            "count": 49
        },
        {
            "name": "2-client-send-data",
            "min_duration": 47.0,
            "avg_duration": 53.0,
            "max_duration": 66.0,
            "count": 50
        }
    ],
    "metrics": [...],
    "router_metrics": [...],
    "net_spy_metrics": [...],
    "five_g_ue_status": [...],
    "five_g_ue_rtt": [...],
    "start_time_human_readable": "2025-04-23T11:25:11.938000Z",
    "end_time_human_readable": "2025-04-23T11:25:21.786000Z",
    "duration": 9848
}

Afther that segment, the ten second memory load (see memory_load ... time:10s ...) is completed, but the twenty second network load (see network_load ... time:20s ...) is still ongoing for another ten seconds:

{
    "commands": [
        "network_load comment:Collect 5G UE status and RTT every 10 seconds, and server status every second src_device_type:ue src_device_name:load-client time:20s load:500 dst_device_type:server dst_device_name:load-server"
    ],
    "indexes": [
        2
    ],
    "start_time": 1745407521786,
    "end_time": 1745407531710,
    "span_statistics": [...],
    "metrics": [...],
    "router_metrics": [...],
    "net_spy_metrics": [...],
    "five_g_ue_status": [...],
    "five_g_ue_rtt": [...],
    "start_time_human_readable": "2025-04-23T11:25:21.786000Z",
    "end_time_human_readable": "2025-04-23T11:25:31.710000Z",
    "duration": 9924
}

Conclusion

We demonstrated the use of the platform to investigate the effect of different conditions on an example client-server app performance.

We did it using a so-called "Chat" window and the platform API.

It allows us to develop and test algorithms controlling the quality of the app service.

⚠️ **GitHub.com Fallback** ⚠️