release2 04‐Demo - Smart-Edge-Lab/SeQaM GitHub Wiki
The platform allows to experiment on how different conditions in the edge environment affect the response time (end to end latency) of a user application.
In order to be aware of how environment conditions affect an application latencies, we need to instrument the app using Open Telemetry.
As an example, let's instrument a real-time video processing app. This app implements one of the autonomous driving tasks recognizing traffic signs.
The below picture illustrates this application running.
The app consist of a client and a server parts.
In the below code snippet you can see a span hierarchy for the client part. The 'e2e-latency' span comprises four sub-spans '1-client-preprocessing', '2-client-send-data', '3-client-receive-data' and '4-client-render-result':
def run_client():
while True:
with tracer.start_as_current_span('e2e-latency'):
tracer_context = get_current_tracer_context()
with tracer.start_as_current_span('1-client-preprocessing'):
raw_frame = get_frame_from_camera()
preprocessed_frame = preprocess_frame(raw_frame)
with tracer.start_as_current_span('2-client-send-data'):
response = send_frame_to_server(preprocessed_frame, tracer_context)
with tracer.start_as_current_span('3-client-receive-data'):
detected_traffic_signs = get_detected_traffic_signs(response)
with tracer.start_as_current_span('4-client-render-result'):
render(raw_frame, detected_traffic_signs)
The 'e2e-latency' span lasts the whole time that was required to get a frame from a camera, prepare it for sending over network ('1-client-preprocessing'), the time required to deliver the frame with the tracer context to the server part, and the time to detect the traffic signs by the server part ('2-client-send-data' with additional spans produced by the server within the transferred tracer context), time to deserialize that detected signs by the client part ('3-client-receive-data'), and, finally, time to render the frame with detected traffic signs on the user display ('4-client-render-result').
The server part is implemented as a stateless application that processes incoming HTTP POST requests from clients and instrumented as follows:
@app.post('/detect-traffic-signs')
def detect_traffic_signs(request: Request) -> DetectedTrafficSigns:
frame: Frame = request.frame
tracer_context: Context = request.context
with tracer.start_as_current_span("server-main", context=tracer_context):
with tracer.start_as_current_span('server-read-data'):
processed_frame = process_frame(frame)
with tracer.start_as_current_span('3-server-processing'):
return infer_traffic_signs(processed_frame)
The client part is running all the time until user decides to stop it and continuously asks the server part to process frames from the camera. It produces a big amount of the above named spans with different durations.
After you installed our platform with the api/bin/install.sh
script, you can find a ScenarioConfig.json
file in the configuration dot-folder .seqam_fh_dortmund_project_emulate/
under your user home directory. The ScenarioConfig.json
file describes the device types and names as well as types and ports of the components that they shall run. For example the following file
{
"distributed": {
"ue": [
{
"name": "client",
"component_type": "distributed_event_manager",
"port": 9011
},
{
"name": "load-client",
"component_type": "network_event_manager",
"port": 9012
}
],
"server": [
{
"name": "server1",
"component_type": "distributed_event_manager",
"port": 9001
},
{
"name": "server2",
"component_type": "distributed_event_manager",
"port": 9002
},
{
"name": "load-server",
"component_type": "network_event_manager",
"port": 9003
}
],
"router": [
{
"name": "router1",
"description": "A Cisco router",
"host": "172.22.228.111",
"vendor": "cisco",
"sampling_interval": 1000,
"emulate": true
},
{
"name": "demo",
"description": "Another vendor router",
"host": "10.0.0.1",
"vendor": "mikrotik",
"emulate": true
}
]
}
}
states that two user-equipment (ue) devices named "client" and "load-client" shall run "distributed_event_manager" and "network_event_manager" components, respectively; two server devices named "server1", "server2" shall run "distributed_event_manager" component and one server device "load-server" shall run another "network_event_manager" instance; and, finally, there are two router devices "router1" and "demo" that will be observed using the SNMP protocol. The "sampling_interval" parameter specifies the router sampling rate in milliseconds. If no "sampling_interval" provided, then the default of 5000 milliseconds is applied.
The IP addresses of all the devices, except of routers, will be resolved automatically while the components are starting up.
The platform allows us to control all the declared devices and to observe their status using so-called commands.
Let's open the env
file on the configuration dot-folder ~/.seqam_fh_dortmund_project_emulate
and update the SEQAM_CENTRAL_HOST
variable with the domain name of the machine Demo-Client
, where we are going to deploy the central component, and update the MECS_WARE_5G_URL
, MECS_WARE_5G_LOGIN
, MECS_WARE_5G_PASSWORD
variables with the 5G controlling software credentials.
Then we build the platform components executing the following command on the platform source-code folder:
./bare-composes/generate-docker-composes.sh y
where the y
parameter means that we need to build tarballs ready to be copied to the remote devices without requirement to rebuild them again and again on every device.
After execution, the script creates, under the bare-composes/
folder, the below subfolders:
seqam-central
seqam-distributed-event-manager
seqam-network-event-manager
seqam-net-spy
seqam-docker-compose-tree
The seqam-central
folder, contains the central platform component tarball, docker-compose files, and a copy of the configuration dot-folder ~/.seqam_fh_dortmund_project_emulate
; the seqam-distributed-event-manager
, seqam-network-event-manager
, and seqam-net-spy
contain the components tarballs. Additionally, all these folders contain a little load-image.sh
script facilitating the deployment of tarballs on docker.
The seqam-docker-compose-tree
folder is special, because it reflects the structure of the ScenarioConfig.json
file and contains the docker compose files for all the declared components:
├── seqam-docker-compose-tree
│ ├── server
│ │ ├── load-server
│ │ │ └── docker-compose.yaml
│ │ ├── server1
│ │ │ └── docker-compose.yaml
│ │ └── server2
│ │ └── docker-compose.yaml
│ └── ue
│ ├── client
│ │ └── docker-compose.yaml
│ └── load-client
│ └── docker-compose.yaml
We ensure that some recent Signoz version with the applied apply-me-on-new-signoz.diff
is running on the machine where we are going to deploy the central component:
ssh Demo-Client
docker ps --format {{.Names}}
signoz-otel-collector
signoz
signoz-clickhouse
signoz-zookeeper-1
We copy the content of the seqam-central
folder to that machine:
scp -r seqam-central/ Demo-Client:
ssh
to the machine, deploy the central component tarball and start it:
cd seqam-central/
./load-image.sh seqam.central.tar.gz
docker compose up -d
Now we can point our web browser to http://demo-client:8000/ and see a so-called platform chat window. We can type "hello" in the prompt and check the response:
SEQAM Chat v0.27.1-20250415 ;)
API docs
4/17/2025, 2:19:55 PM <<< hello
4/17/2025, 2:19:55 PM http://seqam-command-translator:8001/translate/ >>> <Response [200]>
4/17/2025, 2:19:55 PM EventOrchestratorModuleREST >>> Guten Tag!
4/17/2025, 2:19:55 PM http://seqam-event-orchestrator:8002/event/ >>> <Response [200]>
________________________________ Send
You can click on API docs link on the upper left corner under the "SEQAM Chat v0.27.1-20250415 ;)" caption. Here you can try to execute the GET /config/ScenarioConfig.json
API call. It shall output you something like the following:
{
"distributed": {
"ue": [
{
"name": "client",
"component_type": "distributed_event_manager",
"port": 9011
},
{
"name": "load-client",
"component_type": "network_event_manager",
"port": 9012
}
],
"server": [
{
"name": "server1",
"component_type": "distributed_event_manager",
"port": 9001
},
{
"name": "server2",
"component_type": "distributed_event_manager",
"port": 9002
},
{
"name": "load-server",
"component_type": "network_event_manager",
"port": 9003
}
],
"router": [
{
"name": "router1",
"description": "A Cisco router",
"host": "172.22.228.111",
"vendor": "cisco",
"sampling_interval": 1000,
"emulate": true,
"interface_oids": {
"bytes_recv": "1.3.6.1.2.1.2.2.1.10.",
"bytes_sent": "1.3.6.1.2.1.2.2.1.16.",
"bandwidth": "1.3.6.1.2.1.2.2.1.5.",
"packets_recv": "1.3.6.1.2.1.2.2.1.11.",
"packets_sent": "1.3.6.1.2.1.2.2.1.17."
},
"extends": "common",
"cpus_oid": "1.3.6.1.4.1.9.9.109.1.1.1.1.5",
"memory_oids": {
"available": "1.3.6.1.4.1.9.9.48.1.1.1.5.1",
"used": "1.3.6.1.4.1.9.9.48.1.1.1.6.1"
},
"base_oids": {
"processor_memory_pull": {
"free": "1.3.6.1.4.1.9.9.48.1.1.1.5.1",
"used": "1.3.6.1.4.1.9.9.48.1.1.1.6.1"
},
"io_memory_pull": {
"free": "1.3.6.1.4.1.9.9.48.1.1.1.5.2",
"used": "1.3.6.1.4.1.9.9.48.1.1.1.6.2"
},
"cpu": {
"last_5_seconds": "1.3.6.1.4.1.9.2.1.56.0"
}
}
},
{
"name": "demo",
"description": "Another vendor router",
"host": "10.0.0.1",
"vendor": "mikrotik",
"emulate": true,
"interface_oids": {
"bytes_recv": "1.3.6.1.2.1.2.2.1.10.",
"bytes_sent": "1.3.6.1.2.1.2.2.1.16.",
"bandwidth": "1.3.6.1.2.1.2.2.1.5."
},
"extends": "common",
"cpus_oid": "1.3.6.1.2.1.25.3.3.1.2",
"memory_oids": {
"used": "1.3.6.1.2.1.25.2.3.1.6.65536",
"total": "1.3.6.1.2.1.25.2.3.1.5.65536"
},
"base_oids": {
"cpu": {
"frequency": "1.3.6.1.4.1.14988.1.1.3.14.0"
}
}
}
]
}
}
You see that the settings under the "router" section are, in comparison with the initial ScenarioConfig.json
file, extended with the oids for proper router vendors.
Let's deploy "server1" and "load-client" on the same machine coppying the content of the seqam-docker-compose-tree/server/server1
, seqam-docker-compose-tree/ue/load-client
, seqam-distributed-event-manager
and seqam-network-event-manager
there:
scp -r seqam-docker-compose-tree/server/server1 seqam-docker-compose-tree/ue/load-client seqam-distributed-event-manager seqam-network-event-manager Demo-Client:
ssh
to the machine, deploy the tarballs and run them:
cd seqam-distributed-event-manager/
./load-image.sh seqam-distributed-event-manager.tar.gz
cd ../seqam-network-event-manager/
./load-image.sh seqam-network-event-manager.tar.gz
cd ../server1/
docker compose up -d
cd ../load-client/
docker compose up -d
Now, let's hit the GET /config/ScenarioConfig.json
API again. We see that server1
under the server
section and load-client
under the ue
section are automatically extended with resolved machine IP address and the component API routes:
{
"distributed": {
"ue": [
{
"name": "client",
"component_type": "distributed_event_manager",
"port": 9011
},
{
"name": "load-client",
"component_type": "network_event_manager",
"port": 9012,
"description": "load-client",
"host": "172.30.0.1",
"paths": [
{
"network_load": {
"endpoint": "/event/network/load"
}
}
]
}
],
"server": [
{
"name": "server1",
"component_type": "distributed_event_manager",
"port": 9001,
"description": "server1",
"host": "172.22.174.144",
"paths": [
{
"event": {
"endpoint": "/event/"
},
"cpu_load": {
"endpoint": "/event/stress/cpu_load"
},
"memory_load": {
"endpoint": "/event/stress/memory_load"
},
"watch_device": {
"endpoint": "/event/watch"
}
}
]
},
{
"name": "server2",
"component_type": "distributed_event_manager",
"port": 9002
},
{
"name": "load-server",
"component_type": "network_event_manager",
"port": 9003
}
],
"router": [
...
]
}
}
Using the GET /servers
API call, we can see the devices that submit their metrics towards the platform:
[
"server1.server"
]
Let's check the latest metrics on the "server1" device using the GET /metrics
API call, setting host
parameter to "server1.server" and limit
parameter to "1":
[
{
"host": "server1.server",
"time": 1745310305.2320833,
"cpu_state": {
"cpu": {
"cpu_load": [
{
"core": 0,
"percentage": 1.154690865375918
},
{
"core": 1,
"percentage": 4.150003288754844
},
{
"core": 2,
"percentage": 3.151565752304053
},
{
"core": 3,
"percentage": 2.1531284018267205
},
{
"core": 4,
"percentage": 12.137503022440843
},
{
"core": 5,
"percentage": 8.143753155597844
},
{
"core": 6,
"percentage": 0.1562534219118561
},
{
"core": 7,
"percentage": 1.1546909583626475
},
{
"core": 8,
"percentage": 0.1562534219118561
},
{
"core": 9,
"percentage": 5.148440732218917
},
{
"core": 10,
"percentage": 17.12969033274789
},
{
"core": 11,
"percentage": 3.151565752304053
},
{
"core": 12,
"percentage": 3.151565752304053
},
{
"core": 13,
"percentage": 3.1515658452907824
},
{
"core": 14,
"percentage": 1.1546909583626475
},
{
"core": 15,
"percentage": 1.154690865375918
},
{
"core": 16,
"percentage": 1.154690865375918
},
{
"core": 17,
"percentage": 1.154690865375918
},
{
"core": 18,
"percentage": 5.148440732218917
},
{
"core": 19,
"percentage": 3.1515658452907824
},
{
"core": 20,
"percentage": 6.146878175682979
},
{
"core": 21,
"percentage": 0.1562534219118561
},
{
"core": 22,
"percentage": 6.146878175682979
},
{
"core": 23,
"percentage": 1.154690865375918
}
],
"t0": 1745310304.2305183,
"t0_datetime": "2025-04-22T08:25:04.230518",
"t1": 1745310305.2320833,
"t1_datetime": "2025-04-22T08:25:05.232083",
"t1_t0": 1.0015649795532227
},
"core_times": [
{
"idle": 8177791.84
},
{
"idle": 8185296.13
},
{
"idle": 8187008.98
},
{
"idle": 8187666.6
},
{
"idle": 8187574.52
},
{
"idle": 8187819.54
},
{
"idle": 8188892.78
},
{
"idle": 8188368.06
},
{
"idle": 8189801.18
},
{
"idle": 8189787.98
},
{
"idle": 8189749.19
},
{
"idle": 8188952.53
},
{
"idle": 8190525.99
},
{
"idle": 8191822.37
},
{
"idle": 8192427.52
},
{
"idle": 8192425.49
},
{
"idle": 8192903.23
},
{
"idle": 8192862.74
},
{
"idle": 8192958.13
},
{
"idle": 8192465.5
},
{
"idle": 8192170.16
},
{
"idle": 8192541.44
},
{
"idle": 8192655.08
},
{
"idle": 8193354.75
}
],
"core_temperatures": [],
"pressure_some": {
"avg10": 0,
"avg60": 0,
"avg300": 0,
"total": 9542365994
},
"pressure_full": {
"avg10": 0,
"avg60": 0,
"avg300": 0,
"total": 0
}
},
"memory_state": {
"total": 24598319104,
"available": 20181188608,
"pressure_some": {
"avg10": 0,
"avg60": 0,
"avg300": 0,
"total": 403704227
},
"pressure_full": {
"avg10": 0,
"avg60": 0,
"avg300": 0,
"total": 402950013
}
},
"io_state": {
"nic": "all",
"bytes_sent": 1272593733759,
"bytes_recv": 2327620710309,
"packets_sent": 1031459376,
"packets_recv": 1551386874,
"pressure_some": {
"avg10": 0.07,
"avg60": 0.21,
"avg300": 0.32,
"total": 18363992158
},
"pressure_full": {
"avg10": 0.07,
"avg60": 0.21,
"avg300": 0.32,
"total": 18091391884
}
},
"net_state": [
{
"nic": "lo",
"bytes_sent": 300134629,
"bytes_recv": 300134629,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 4095035,
"packets_recv": 4095035
},
{
"nic": "ens18",
"bytes_sent": 38828273251,
"bytes_recv": 1075684703243,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 329.4843637076909,
"packets_sent": 27282053,
"packets_recv": 126869696
},
{
"nic": "docker0",
"bytes_sent": 264553346,
"bytes_recv": 914032,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 16539,
"packets_recv": 10888
},
{
"nic": "br-a196da213c81",
"bytes_sent": 0,
"bytes_recv": 0,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 0,
"packets_recv": 0
},
{
"nic": "br-6a79a1a1cef5",
"bytes_sent": 40854481964,
"bytes_recv": 580990046970,
"bytes_sent_per_second": 96356.20451011552,
"bytes_recv_per_second": 1330236.2075342524,
"packets_sent": 228530977,
"packets_recv": 506840024
},
{
"nic": "veth2caf87e",
"bytes_sent": 63096910,
"bytes_recv": 122090954,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 531652,
"packets_recv": 277840
},
{
"nic": "br-e0632b092d71",
"bytes_sent": 575350720643,
"bytes_recv": 39442687444,
"bytes_sent_per_second": 1318987.811044765,
"bytes_recv_per_second": 91233.22187319382,
"packets_sent": 270223122,
"packets_recv": 202178095
},
{
"nic": "veth0a0c455",
"bytes_sent": 53132,
"bytes_recv": 0,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 553,
"packets_recv": 0
},
{
"nic": "vethc744fda",
"bytes_sent": 55962,
"bytes_recv": 3307,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 585,
"packets_recv": 34
},
{
"nic": "veth83e9e48",
"bytes_sent": 56744,
"bytes_recv": 4758,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 594,
"packets_recv": 50
},
{
"nic": "veth9a0709b",
"bytes_sent": 13064103,
"bytes_recv": 2063614,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 11678,
"packets_recv": 12605
},
{
"nic": "vetha8989f7",
"bytes_sent": 41246820666,
"bytes_recv": 588427836427,
"bytes_sent_per_second": 96356.20451011552,
"bytes_recv_per_second": 1346716.4163444317,
"packets_sent": 229476550,
"packets_recv": 507931263
},
{
"nic": "veth5a5c94d",
"bytes_sent": 5102001,
"bytes_recv": 4797202,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 80660,
"packets_recv": 79792
},
{
"nic": "veth9df885b",
"bytes_sent": 329502690,
"bytes_recv": 374305522,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 996297,
"packets_recv": 926008
},
{
"nic": "veth12f98e1",
"bytes_sent": 575337732575,
"bytes_recv": 42271120421,
"bytes_sent_per_second": 1318987.811044765,
"bytes_recv_per_second": 97844.87477159483,
"packets_sent": 270212255,
"packets_recv": 202165526
},
{
"nic": "br-dcc1e4961589",
"bytes_sent": 30551,
"bytes_recv": 830,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 282,
"packets_recv": 9
},
{
"nic": "veth09d2a84",
"bytes_sent": 54592,
"bytes_recv": 956,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 0,
"packets_sent": 544,
"packets_recv": 9
}
]
}
]
From the "cpu_load" field, we can see that the device CPU is more or less idle now. You can see cpu preassure, memory and networking measures here as well.
Now, let's return to the "Chat" window in the web browser and type the "help" command. It will output you a list of possible "commands":
There are the following commands available: help, hello, ssh, start_module, exit, stop_module, cpu_load, memory_load, network_bandwidth, network_load, migrate, connect, disconnect, watch_device, nop. You can find detailed info about them typing help <command>
Let's try to apply some load on the CPU of the "server1" device. For this purpose, let's check the detailed documentaion on the cpu_load
command, typing help cpu_load
in the "Chat" window:
cpu_load [comment:str] src_device_type:ue|server|router|5g_ue src_device_name:str time:str load:str [mode:stat|rand|inc|dec] [random_seed:int] [load_min:int] [load_max:int] [load_step:int] [time_step:int] cores:int
All parameters in the squared brekets are optional. Let's apply a 50% load on all the CPU cores of the "server1" by hitting the following command in the "Chat":
cpu_load src_device_type:server src_device_name:server1 time:1m load:50 cores:0
where time:1m
means that the load will last for one minute; cores:0
means that the load will be applied on all the CPU cores.
While the command is being executed, let's check the output of the GET /metrics
API call:
[
{
"host": "server1.server",
"time": 1745311383.4414678,
"cpu_state": {
"cpu": {
"cpu_load": [
{
"core": 0,
"percentage": 53.09985399783833
},
{
"core": 1,
"percentage": 66.07223487022137
},
{
"core": 2,
"percentage": 57.09135587621341
},
{
"core": 3,
"percentage": 54.097729513899296
},
{
"core": 4,
"percentage": 60.08498214559312
},
{
"core": 5,
"percentage": 54.097729513899296
},
{
"core": 6,
"percentage": 56.09348036015245
},
{
"core": 7,
"percentage": 62.08073308478067
},
{
"core": 8,
"percentage": 57.09135587621341
},
{
"core": 9,
"percentage": 58.089231299339986
},
{
"core": 10,
"percentage": 55.095604937025875
},
{
"core": 11,
"percentage": 52.10197857471176
},
{
"core": 12,
"percentage": 52.10197857471176
},
{
"core": 13,
"percentage": 53.09985409077272
},
{
"core": 14,
"percentage": 53.09985399783833
},
{
"core": 15,
"percentage": 58.089231206405586
},
{
"core": 16,
"percentage": 49.10835230533203
},
{
"core": 17,
"percentage": 32.14446964750832
},
{
"core": 18,
"percentage": 54.097729513899296
},
{
"core": 19,
"percentage": 53.09985409077272
},
{
"core": 20,
"percentage": 53.09985399783833
},
{
"core": 21,
"percentage": 55.09560502996027
},
{
"core": 22,
"percentage": 56.09348036015245
},
{
"core": 23,
"percentage": 60.08498214559312
}
],
"t0": 1745311382.4393387,
"t0_datetime": "2025-04-22T08:43:02.439339",
"t1": 1745311383.4414678,
"t1_datetime": "2025-04-22T08:43:03.441468",
"t1_t0": 1.002129077911377
},
...
]
You see, that the CPU load on all the cores is increased to average of 50% in comparison with the "idle" state that we observed before.
Now let's deploy the "server2", "load-server" and "net-spy" components on the Demo-Server
machine:
cd bare-composes/
scp -r seqam-distributed-event-manager/ seqam-network-event-manager/ seqam-docker-compose-tree/server/server2/ seqam-docker-compose-tree/server/load-server/ seqam-net-spy/ Demo-Server:
ssh Demo-Server
cd seqam-distributed-event-manager/
./load-image.sh seqam-distributed-event-manager.tar.gz
cd ../seqam-network-event-manager/
./load-image.sh seqam-network-event-manager.tar.gz
cd ../seqam-net-spy/
./load-image.sh seqam-net-spy.tar.gz
cd ../server2/
docker compose up -d
cd ../load-server/
docker compose up -d
cd ../seqam-net-spy/
docker compose up -d
Please note, that the server-part of the example application is also already deployed on the same Demo-Server
machine.
Let's check the output of the GET /servers
API:
[
"server1.server",
"server2.server"
]
We see that now it outputs the two devices.
The change is also visible in the GET /config/ScenarioConfig.json
API:
{
"distributed": {
...
"server": [
{
"name": "server1",
"component_type": "distributed_event_manager",
"port": 9001,
"description": "server1",
"host": "172.22.174.144",
"paths": [
{
"event": {
"endpoint": "/event/"
},
"cpu_load": {
"endpoint": "/event/stress/cpu_load"
},
"memory_load": {
"endpoint": "/event/stress/memory_load"
},
"watch_device": {
"endpoint": "/event/watch"
}
}
]
},
{
"name": "server2",
"component_type": "distributed_event_manager",
"port": 9002,
"description": "server2",
"host": "172.22.174.142",
"paths": [
{
"event": {
"endpoint": "/event/"
},
"cpu_load": {
"endpoint": "/event/stress/cpu_load"
},
"memory_load": {
"endpoint": "/event/stress/memory_load"
},
"watch_device": {
"endpoint": "/event/watch"
}
}
]
},
{
"name": "load-server",
"component_type": "network_event_manager",
"port": 9003,
"description": "load-server",
"host": "172.22.174.142",
"paths": [
{
"network_load": {
"endpoint": "/event/network/load"
}
}
]
}
],
...
}
}
where the "server2" and "load-server" are now enriched with their appropriate IP addresses and URI paths.
Now, let's put a network load between the load-client
and load-server
devices. For this purpose, we ask for a help in the "Chat" window:
help network_load
that outputs
network_load [comment:str] src_device_type:ue|server|router|5g_ue src_device_name:str time:str load:str [mode:stat|rand|inc|dec] [random_seed:int] [load_min:int] [load_max:int] [load_step:int] [time_step:int] [interface:str] [dst_device_type:ue|server|router|5g_ue] [dst_device_name:str]
The load value is in Megabits and we should specify the source and destination for the network traffic stress to be applied:
network_load src_device_type:ue src_device_name:load-client time:1m load:500 dst_device_type:server dst_device_name:load-server
While the above command is being executed, let's get the "server1" metrics using the GET /metrics
API and compare the "bytes_sent_per_second" network readings for the ens18 interface in the "idle" and "stressed" states:
IDLE:
{
"nic": "ens18",
"bytes_sent": 38828273251,
"bytes_recv": 1075684703243,
"bytes_sent_per_second": 0,
"bytes_recv_per_second": 329.4843637076909,
"packets_sent": 27282053,
"packets_recv": 126869696
}
UNDER STRESS:
{
"nic": "ens18",
"bytes_sent": 39083440208,
"bytes_recv": 1075702505853,
"bytes_sent_per_second": 62583735.639007784,
"bytes_recv_per_second": 95971.2765071737,
"packets_sent": 27297522,
"packets_recv": 126909867
}
It means that a lot of data (a bit more than 500
Mbits) is being sent every second from the Demo-Client
machine where the load-client
and server1
are deployed.
Let's check the router metrics with the GET /routers/{host}/metrics
API call, setting the host
parameter to "router1" and the from_cache
parameter to true
:
{
"time": 1745316706.7120934,
"interfaces": [
{
"nic": "eth0",
"bytes_sent": 42353596,
"bytes_recv": 4232390,
"bytes_sent_per_second": 99.78308149883595,
"bytes_recv_per_second": 9.978308149883595,
"packets_sent": 2867,
"packets_recv": 6752,
"bandwidth": 1000000000,
"up_link_utilization": 0.00007982646519906876,
"dn_link_utilization": 0.000007982646519906876
},
{
"nic": "eth1",
"bytes_sent": 42353593,
"bytes_recv": 4232406,
"bytes_sent_per_second": 99.78308149883595,
"bytes_recv_per_second": 9.978308149883595,
"packets_sent": 7245,
"packets_recv": 590,
"bandwidth": 1000000000,
"up_link_utilization": 0.00007982646519906876,
"dn_link_utilization": 0.000007982646519906876
},
{
"nic": "eth2",
"bytes_sent": 42353580,
"bytes_recv": 4232392,
"bytes_sent_per_second": 99.78308149883595,
"bytes_recv_per_second": 9.978308149883595,
"packets_sent": 2767,
"packets_recv": 4290,
"bandwidth": 1000000000,
"up_link_utilization": 0.00007982646519906876,
"dn_link_utilization": 0.000007982646519906876
}
],
"cpu": [
{
"core": 1,
"percentage": 29
},
{
"core": 2,
"percentage": 15
},
{
"core": 3,
"percentage": 56
},
{
"core": 4,
"percentage": 85
}
],
"memory": {
"total": 8415,
"available": 3965
},
"base_metrics": {
"processor_memory_pull": {
"free": "3847",
"used": "5859"
},
"io_memory_pull": {
"free": "4884",
"used": "1937"
},
"cpu": {
"last_5_seconds": "399"
}
}
}
We can get several last readings for particular router interface using the GET /routers/{router_name}/load
API call. So we set router_name
to router1
, nic
to eth1
and limit
to 2
and receive:
[
{
"host": "router/router1",
"time": "2025-04-22T10:24:21",
"nic": "eth1",
"bytes_sent": 99,
"bytes_recv": 9,
"cum_bytes_sent": 42428913,
"cum_bytes_recv": 4239932,
"time_diff": 1.0014314651489258
},
{
"host": "router/router1",
"time": "2025-04-22T10:24:20",
"nic": "eth1",
"bytes_sent": 99,
"bytes_recv": 9,
"cum_bytes_sent": 42428813,
"cum_bytes_recv": 4239922,
"time_diff": 1.0012950897216797
}
]
You see that time_diff
is approximately one second as was configured in the "sampling_interval": 1000,
on the ScenarioConfig.json
for the router router1
.
The same call for the router demo
outputs:
[
{
"host": "router/demo",
"time": "2025-04-22T10:28:41",
"nic": "eth1",
"bytes_sent": 99,
"bytes_recv": 9,
"cum_bytes_sent": 42505907,
"cum_bytes_recv": 4244662,
"time_diff": 5.006640911102295
},
{
"host": "router/demo",
"time": "2025-04-22T10:28:36",
"nic": "eth1",
"bytes_sent": 99,
"bytes_recv": 9,
"cum_bytes_sent": 42505407,
"cum_bytes_recv": 4244612,
"time_diff": 5.000458717346191
}
]
where time_diff
is approximately five seconds, which is the default value for the sampling_interval
.
We can change the sampling interval with the watch_device
command. The help watch_device
outputs the following manual:
watch_device [comment:str] src_device_type:ue|server|router|5g_ue src_device_name:str interval:int [metrics:str]
So we change the sampling interval for router demo
to, say, three seconds:
watch_device src_device_type:router src_device_name:demo interval:3000
After the command was executed, we can get the collected router load samples with the GET /routers/{router_name}/load
call again and ensure that the sampling interval really changed:
[
{
"host": "router/demo",
"time": "2025-04-22T11:56:49",
"nic": "eth1",
"bytes_sent": 99,
"bytes_recv": 9,
"cum_bytes_sent": 43034328,
"cum_bytes_recv": 4297412,
"time_diff": 3.0013105869293213
},
{
"host": "router/demo",
"time": "2025-04-22T11:56:46",
"nic": "eth1",
"bytes_sent": 99,
"bytes_recv": 9,
"cum_bytes_sent": 43034028,
"cum_bytes_recv": 4297382,
"time_diff": 3.0004513263702393
}
]
We can use the same watch_device
command to update the sampling interval for another devices, like servers (src_device_type:server
), user equipment (src_device_type:ue
), or 5G user equipment devices (src_device_type:5g_ue
).
User can automate triggering of all the commands using the ExperimentConfig.json
file. An example content of the file is as follows:
{
"experiment_name": "Short Integration Test",
"eventList": [
{
"command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:3000 metrics:status",
"executionTime": 0
},
{
"command": "watch_device src_device_type:server src_device_name:server2 interval:100",
"executionTime": 0
},
{
"command": "watch_device src_device_type:ue src_device_name:client interval:3000",
"executionTime": 0
},
{
"command": "nop time:10s comment:'Collect 5G UE status every 3 seconds and server status every 1/10 second'",
"executionTime": 0
},
{
"command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:8000",
"executionTime": 10000
},
{
"command": "watch_device src_device_type:server src_device_name:server2 interval:500",
"executionTime": 10000
},
{
"command": "cpu_load src_device_type:server src_device_name:server2 cores:0 load:50 time:20s comment:'Collect 5G UE status and RTT every 8 seconds, server status every 1/2 second'",
"executionTime": 10000
},
{
"command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:10000",
"executionTime": 30000
},
{
"command": "watch_device src_device_type:server src_device_name:server2 interval:1000",
"executionTime": 30000
},
{
"command": "memory_load src_device_type:server src_device_name:server2 workers:5 load:20m time:10s comment:'Collect 5G UE status and RTT every 10 seconds, and server status every second'",
"executionTime": 30000
},
{
"command": "network_load src_device_type:ue src_device_name:load-client dst_device_type:server dst_device_name:load-server load:500 time:20s comment:'Collect 5G UE status and RTT every 10 seconds, and server status every second'",
"executionTime": 30000
},
{
"command": "nop time:10s comment:'Collect 5G UE status and RTT every 10 seconds, and server status every second'",
"executionTime": 50000
},
{
"command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:100000",
"executionTime": 60000
},
{
"command": "watch_device src_device_type:server src_device_name:server2 interval:100000",
"executionTime": 60000
},
{
"command": "watch_device src_device_type:ue src_device_name:client interval:100000",
"executionTime": 60000
},
{
"command": "exit",
"executionTime": 60001
}
]
}
where experiment_name
is the name of the experiment that will appear in the list of conducted experiments; eventList
is the sequence of commands that will be applied during the experiment at time executionTime
since the start of the experiment.
The first command
{
"command": "watch_device src_device_type:5g_ue src_device_name:MECS-IPC interval:3000 metrics:status",
"executionTime": 0
}
means that we are watching the status (metrics:status
) of a 5G user equipment device with name MECS-IPC
every three seconds (interval:3000
) since the very begin of the experiment ("executionTime": 0
).
The second command
{
"command": "watch_device src_device_type:server src_device_name:server2 interval:100",
"executionTime": 0
}
means that simultaneously with the first watch ("executionTime": 0
), we are watching the server2
device every 1/10 second interval:100
.
The nop
command is used to label a span of time with a human-readable comment:
{
"command": "nop time:10s comment:'Collect 5G UE status every 3 seconds and server status every 1/10 second'",
"executionTime": 0
}
Actually, any command, including the stress one, can be enrichen with a comment, for example:
{
"command": "cpu_load src_device_type:server src_device_name:server2 cores:0 load:50 time:20s comment:'Collect 5G UE status and RTT every 8 seconds, server status every 1/2 second'",
"executionTime": 10000
}
Let's start the client part of our example app together with the client ue component on a laptop.
The client ue component is getting registered in the platform API and is enriched with the IP address and URI paths in the output of GET /config/ScenarioConfig.json
:
{
"distributed": {
"ue": [
{
"name": "client",
"component_type": "distributed_event_manager",
"port": 9011,
"description": "client",
"host": "172.22.229.149",
"paths": [
{
"event": {
"endpoint": "/event/"
},
"cpu_load": {
"endpoint": "/event/stress/cpu_load"
},
"memory_load": {
"endpoint": "/event/stress/memory_load"
},
"watch_device": {
"endpoint": "/event/watch"
}
}
]
},
...
Now begin the experiment hitting the following command in the "Chat" window:
start_module module:experiment_dispatcher
this command produces a lot of output in the "Chat" window. Wait a minute until you find a message
4/22/2025, 3:02:02 PM ExperimentDispatcher >>> Experiment Short Integration Test-22.04.2025T13:00:59 is finished
indicating that the experiment is finished.
Now let's open the API page again and execute the GET /experiments
API call. This call outputs the list of conducted experiments, that consist of only one experiment so far:
[
"Short Integration Test-22.04.2025T13:00:59"
]
Also, we need to check the list of running instrumented apps using the GET /apps
API call. It outputs the following:
[
"demo-dnn-partitioning",
"experiment_dispatcher_fh_dortmund_project_emulate"
]
where demo-dnn-partitioning
is our instrumented example app; experiment_dispatcher_fh_dortmund_project_emulate
is a platform component conducting the experiments itself.
Let's execute another API call GET /experiments/{exp_name}/apps/{app_name}
, substituting the exp_name
with "Short Integration Test-22.04.2025T13:00:59"
and app_name
with demo-dnn-partitioning
. It outputs experiment statistics grouped by so-called experiment segments
An experiment segment is a portion of time during what one or more stress or nop commands are happening in parallel.
For example, the first segment consist of only one command and looks like:
{
"commands": [
"nop comment:Collect 5G UE status every 3 seconds and server status every 1/10 second time:10s"
],
"indexes": [
0
],
"start_time": 1745407481748,
"end_time": 1745407491748,
"span_statistics": [
{
"name": "e2e-latency",
"min_duration": 97.0,
"avg_duration": 196.0,
"max_duration": 261.0,
"count": 51
},
{
"name": "4-client-render-result",
"min_duration": 3.0,
"avg_duration": 5.0,
"max_duration": 12.0,
"count": 51
},
{
"name": "1-client-preprocessing",
"min_duration": 45.0,
"avg_duration": 140.0,
"max_duration": 207.0,
"count": 51
},
{
"name": "3-client-receive-data",
"min_duration": 0.0,
"avg_duration": 0.0,
"max_duration": 0.0,
"count": 51
},
{
"name": "2-client-send-data",
"min_duration": 45.0,
"avg_duration": 50.0,
"max_duration": 56.0,
"count": 50
}
],
"metrics": [...],
"router_metrics": [...],
"net_spy_metrics": [...],
"five_g_ue_status": [...],
"five_g_ue_rtt": [...],
"start_time_human_readable": "2025-04-23T11:24:41.748000Z",
"end_time_human_readable": "2025-04-23T11:24:51.748000Z",
"duration": 10000
}
The segment lasts for 10000 milliseconds (see "duration": 10000
field) or 10 seconds as expected. During this segment the platform aggregated the span statistics in the "span_statistics"
field; collected CPU, memory and network metrics in the "metrics"
field; router metrics in the "router_metrics"
field; link utilization metrics in the "net_spy_metrics"
field; 5G user equipment status and round trip time in the "five_g_ue_status"
and "five_g_ue_rtt"
fields.
We can ensure that during that segment we really collect CPU, memory and network metrics every 1/10 second expanding the "metrics"
field and examining the "t1_t0"
value:
{
"commands": [
"nop comment:Collect 5G UE status every 3 seconds and server status every 1/10 second time:10s"
],
"indexes": [...],
"start_time": 1745407481748,
"end_time": 1745407491748,
"span_statistics": [...],
"metrics": [
{
"host": "server2.server",
"time": 1745407482.2817087,
"cpu_state": {
"cpu": {
"cpu_load": [...],
"t0": 1745407482.18046,
"t0_datetime": "2025-04-23T11:24:42.180460",
"t1": 1745407482.2817087,
"t1_datetime": "2025-04-23T11:24:42.281709",
"t1_t0": 0.10124874114990234
The second twenty seconds long segment is as follows:
{
"commands": [
"cpu_load comment:Collect 5G UE status and RTT every 8 seconds, server status every 1/2 second src_device_type:server src_device_name:server2 time:20s load:50 cores:0"
],
"start_time": 1745407491875,
"end_time": 1745407511743,
"span_statistics": [
{
"name": "e2e-latency",
"min_duration": 102.0,
"avg_duration": 207.0,
"max_duration": 282.0,
"count": 95
},
{
"name": "4-client-render-result",
"min_duration": 3.0,
"avg_duration": 5.0,
"max_duration": 8.0,
"count": 95
},
{
"name": "1-client-preprocessing",
"min_duration": 38.0,
"avg_duration": 135.0,
"max_duration": 206.0,
"count": 95
},
{
"name": "3-client-receive-data",
"min_duration": 0.0,
"avg_duration": 0.0,
"max_duration": 0.0,
"count": 95
},
{
"name": "2-client-send-data",
"min_duration": 52.0,
"avg_duration": 65.0,
"max_duration": 80.0,
"count": 96
}
],
"metrics": [...],
"router_metrics": [...],
"net_spy_metrics": [...],
"five_g_ue_status": [...],
"five_g_ue_rtt": [...],
"start_time_human_readable": "2025-04-23T11:24:51.875000Z",
"end_time_human_readable": "2025-04-23T11:25:11.743000Z",
"duration": 19868
}
We can see that the CPU load affected the latency a little bit.
The third ten seconds segment comprises both network and memory load simultaneously:
{
"commands": [
"network_load comment:Collect 5G UE status and RTT every 10 seconds, and server status every second src_device_type:ue src_device_name:load-client time:20s load:500 dst_device_type:server dst_device_name:load-server",
"memory_load comment:Collect 5G UE status and RTT every 10 seconds, and server status every second src_device_type:server src_device_name:server2 time:10s load:20m workers:5"
],
"start_time": 1745407511938,
"end_time": 1745407521786,
"span_statistics": [
{
"name": "e2e-latency",
"min_duration": 173.0,
"avg_duration": 199.0,
"max_duration": 252.0,
"count": 49
},
{
"name": "4-client-render-result",
"min_duration": 4.0,
"avg_duration": 6.0,
"max_duration": 10.0,
"count": 49
},
{
"name": "1-client-preprocessing",
"min_duration": 107.0,
"avg_duration": 138.0,
"max_duration": 195.0,
"count": 49
},
{
"name": "3-client-receive-data",
"min_duration": 0.0,
"avg_duration": 0.0,
"max_duration": 0.0,
"count": 49
},
{
"name": "2-client-send-data",
"min_duration": 47.0,
"avg_duration": 53.0,
"max_duration": 66.0,
"count": 50
}
],
"metrics": [...],
"router_metrics": [...],
"net_spy_metrics": [...],
"five_g_ue_status": [...],
"five_g_ue_rtt": [...],
"start_time_human_readable": "2025-04-23T11:25:11.938000Z",
"end_time_human_readable": "2025-04-23T11:25:21.786000Z",
"duration": 9848
}
Afther that segment, the ten second memory load (see memory_load ... time:10s ...
) is completed, but the twenty second network load (see network_load ... time:20s ...
) is still ongoing for another ten seconds:
{
"commands": [
"network_load comment:Collect 5G UE status and RTT every 10 seconds, and server status every second src_device_type:ue src_device_name:load-client time:20s load:500 dst_device_type:server dst_device_name:load-server"
],
"indexes": [
2
],
"start_time": 1745407521786,
"end_time": 1745407531710,
"span_statistics": [...],
"metrics": [...],
"router_metrics": [...],
"net_spy_metrics": [...],
"five_g_ue_status": [...],
"five_g_ue_rtt": [...],
"start_time_human_readable": "2025-04-23T11:25:21.786000Z",
"end_time_human_readable": "2025-04-23T11:25:31.710000Z",
"duration": 9924
}
We demonstrated the use of the platform to investigate the effect of different conditions on an example client-server app performance.
We did it using a so-called "Chat" window and the platform API.
It allows us to develop and test algorithms controlling the quality of the app service.