06 Demo - Smart-Edge-Lab/SeQaM GitHub Wiki

DNN Partitioning

This demo application is intended for testing the SeQaM platform. It showcases a Distributed Neural Network and illustrates how to utilize SeQaM for performance evaluation within the edge node.

The DNN partitioning is of a warning sign detection, it divides the neural network's layers between distributed client and server. A scheduler manages the partitioning, offering 25 different split points from 0 to 24. When the partition point is set to 0, the entire network executes on the server. For higher values, a portion of the initial layers runs on the server, while the remaining layers are processed on the client. This setup enables the distribution of computational tasks, allowing the system to optimize resource usage across both machines.

DNN Explination

The image below illustrates a simple setup where a client interacts with an edge server. This configuration allows testing of DNN partitioning app and its performance in an edge environment.

DNN Partitioning Setup

Configuration Files

This section provides details on the configuration files used in the DNN partitioning demo application. These files are essential for setting up the various components of the platform like ModuleConfig.json, ScenarioConfig.json, and env.

ModuleConfig.json

{
    "modules":[
       {
          "console":{
             "name":"Console",
             "description":"Console module to input commands",
             "port":0,
             "host":"0.0.0.0",
             "paths":[
                
             ]
          }
       },
       {
          "command_translator":{
             "name":"Command Translator",
             "description":"Get raw commands and forward them to orchestrators in json format",
             "port": $COMMAND_TRANSLATOR_PORT | 8001,
             "host": "$COMMAND_TRANSLATOR_HOST | 172.22.174.157",
             "paths":[
                {
                   "translate":{
                      "endpoint":"/translate/"
                   }
                }
             ]
          }
       },
       {
          "event_orchestrator":{
             "name":"Event Orchestrator",
             "description":"Get event requests",
             "port": $EVENT_ORCHESTRATOR_PORT | 8002,
             "host": "$EVENT_ORCHESTRATOR_HOST | 172.22.174.157",
             "paths":[
                {
                   "event":{
                      "endpoint":"/event/"
                   }
                }
             ]
          }
       },
      {
        "experiment_dispatcher":{
           "name":"Experiment Dispatcher",
           "description":"Executes the configured experiment",
           "port": $EXPERIMENT_DISPATCHER_PORT | 8004,
           "host": "$EXPERIMENT_DISPATCHER_HOST | 172.22.174.157",
           "paths":[
              {
                 "start":{
                    "endpoint":"/experiment/init/"
                 },
                 "stop":{
                    "endpoint":"/experiment/init/"
                 }
              }
           ]
        }
     }
    ]
 }

ScenarioConfig.json

{
    "distributed":
    {
        "ue":[
            {
                "name":"ue001",
                "description":"client allocating the DNN client",
                "port": 9001,
                "host": "192.168.6.3",
                "ssh_port": 22,
                "ssh_user": "ubuntu",
                "paths":[
                   {
                      "event":{
                         "endpoint":"/event/"
                      },
                      "cpu_load":{
                        "endpoint":"/event/stress/cpu_load"
                     },
                     "memory_load":{
                       "endpoint":"/event/stress/memory_load"
                    }
                   }
                ]
            },
            {
                "name":"ue002",
                "description":"load client",
                "port": 8887,
                "host": "192.168.6.2",
                "paths":[
                   {
                      "event":{
                         "endpoint":"/event/"
                      },
                      "cpu_load":{
                        "endpoint":"/event/stress/cpu_load"
                     },
                     "memory_load":{
                       "endpoint":"/event/stress/memory_load"
                    }
                   }
                ]
            }
        ] ,
        "server":[
            {
                "name":"svr001",
                "description":"server allocating the DNN server",
                "port": 9001,
                "host": "172.22.232.67",
                "paths":[
                   {
                      "event":{
                         "endpoint":"/event/"
                      },
                      "cpu_load":{
                        "endpoint":"/event/stress/cpu_load"
                     },
                     "memory_load":{
                       "endpoint":"/event/stress/memory_load"
                    }
                   }
                ]
            },
            {
                "name":"svr002",
                "description":"load server",
                "port": 9002,
                "host": "172.22.232.66",
                "paths":[
                   {
                      "event":{
                         "endpoint":"/event/"
                      },
                      "cpu_load":{
                        "endpoint":"/event/stress/cpu_load"
                     },
                     "memory_load":{
                       "endpoint":"/event/stress/memory_load"
                    }
                   }
                ]
            }
        ],
        "router":[
         {
             "name":"ntw_agent",
             "description":"deprecated section",
             "port": 8887,
             "host":"127.0.0.1",
             "paths":[
                {
                   "network_bandwidth":{
                      "endpoint":"/event/network/bandwidth"
                   },
                   "network_load":{
                     "endpoint":"/event/network/load"
                  }
                }
             ]
         }
      ]
    }
}

env

SEQAM_CENTRAL_HOST=192.168.6.4
DATABASE_ENDPOINT="$SEQAM_CENTRAL_HOST"
OTLP_URL="$DATABASE_ENDPOINT":4317
API_HOST="$SEQAM_CENTRAL_HOST"
API_PORT=8000
COMMAND_TRANSLATOR_HOST="$SEQAM_CENTRAL_HOST"
COMMAND_TRANSLATOR_PORT=8001
EVENT_ORCHESTRATOR_HOST="$SEQAM_CENTRAL_HOST"
EVENT_ORCHESTRATOR_PORT=8002
EXPERIMENT_DISPATCHER_HOST="$SEQAM_CENTRAL_HOST"
EXPERIMENT_DISPATCHER_PORT=8004

Installation

First, install the Central Components on the EMULATE SeQaM platform server. During the setup, configure the env, and ScenarioConfig.json files. Then:

Install and deploy the Distributed components on the distributed server.
Install and deploy the Network components on the load client and load server .

Follow the steps below to install and run the DNN Partitioning demo app. Both the distributed client and server must follow some of these instructions.

1. Prepare the demo

First, from the source repository, go to the demo/dnn-partition-app folder and run the install-demo.sh file to download the pre-trained model to be used by the application.

./install-demo.sh

Copy the folder to the machines where you will run the client and the server.

2. Edit the Environment in docker-compose.yaml

From the distributed client (192.168.6.2) and distributed server (172.22.232.67) machines Open the docker-compose.yaml file in the root directory of the application. Update the following environment variables:

otlp_host: Set this to the IP address of the central collector.
server_host: Set this to the IP address of the distributed server.
scheduler_host: Set this to the IP address of the distributed client.

3. Start the Scheduler on the Client

On the distributed client, navigate to the scheduler directory within dnn-partition-app:

cd scheduler
python3 scheduler.py

4. Start the Server

On the distributed server, move to the root of the dnn-partition-app directory and run:

docker compose up server

5. Start the Client

After the server is running, go back to the distributed client, and from the root of the dnn-partition-app directory, run:

docker compose up client

6. Set the Partition Point

Once both the server and client are running, an input field will appear in the scheduler.py console on the client. In this field, enter a value between 0 and 24 to set the fixed partition point for the DNN. This partition point determines where the DNN will be split between the client and the server for processing.

Once all steps are complete, the DNN partitioning demo app will be running on the SeQaM platform. You can now monitor performance using SeQaM’s monitoring and event management tools. From the central server access the console through the web interface on port 8000, experiments can be run here to simulate real-world loads like:

CPULoad: Stress the server by running CPU load experiments to evaluate its performance under heavy computational demand.
Network Load: Use the load client and load server to simulate stress on the network connection between the edge server and the client.

The ExperimentConfig.json used in this demo is shown below.

{
  "experiment_name": "seqam_demo_1",
  "eventList": [
    {
      "command": "network_load src_device_type:router src_device_name:router1 interface:eth2 load:5 mode:inc time:100s load_min:0 load_max:50 load_step:5 time_step:10",
      "executionTime": 1000
    },
    {
      "command": "cpu_load src_device_type:ue src_device_name:ue001 cores:0 load:98 time:100s",
      "executionTime": 110000
    },
    {
      "command": "cpu_load src_device_type:ue src_device_name:ue001 cores:6 load:100 time:100s mode:rand random_seed:2",
      "executionTime": 220000
    },
    {
      "command": "network_load src_device_type:router src_device_name:router1 interface:eth2 load:5 time:100s mode:rand random_seed:2",
      "executionTime": 220005
    },
    {
      "command": "exit",
      "executionTime": 330000
    }
  ]
}

For reference, find the screenshot of the experiment running on the platform:

Experiment running on the platform