android device - YingkunZhou/EdgeTransformerBench GitHub Wiki

https://apkpure.com/termux/com.termux/

pkg install root-repo x11-repo
pkg upgrade
pkg install openssh
passwd
sshd
ip a
pkg search <name>
pkg install <name>

devices

  • SmartPack-Kernel-Manager CPU common config
    • 2 big cores(7, 8): [2002MHz, 2730MHz]:performance
    • 2 middle cores(5, 6): [2002MHz, 2504MHz]: performance
    • 4 little cores(1~4): [442, 2002MHz]: Performance

image image

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    FILE* ptr;
    signed char ch;
    int end = atoi(argv[1]);

    while(end) {
        // Opening file in reading mode
        ptr = fopen("/sys/devices/platform/18500000.mali/clock", "r");
        // Printing what is written in file
        // character by character using loop.
        while (1) {
            ch = fgetc(ptr);
            // Checking if character is not EOF.
            // If it is EOF stop reading.
            if (ch == '\n') break;
            printf("%c", ch);
        }
        // Closing the file
        fclose(ptr);
        printf("KHz @ ");

        ptr = fopen("/sys/devices/platform/18500000.mali/utilization", "r");
        // Printing what is written in file
        // character by character using loop.
        while (1) {
            ch = fgetc(ptr);
            // Checking if character is not EOF.
            // If it is EOF stop reading.
            if (ch == '\n') break;
            printf("%c", ch);
        }
        printf("%%\n");
        // Closing the file
        fclose(ptr);

        usleep(200*1000);
        end--;
    }
    return 0;
}
sudo cat /sys/devices/platform/18500000.mali/dvfs
sudo cat /sys/devices/platform/18500000.mali/dvfs_governor
sudo cat /sys/devices/platform/18500000.mali/dvfs_table
> 800000 702000 572000 455000 377000 260000 156000
su
echo 800000 > /sys/devices/platform/18500000.mali/dvfs_max_lock
echo 800000 > /sys/devices/platform/18500000.mali/dvfs_min_lock
exit
# back to default dynamic freq
echo -1 > /sys/devices/platform/18500000.mali/dvfs_min_lock
echo -1 > /sys/devices/platform/18500000.mali/dvfs_max_lock
image

image

How to enable OpenCl?

export LD_LIBRARY_PATH=/vendor/lib64

can use SmartPack-Kernel-Manager set gpu frequency and watch gpu utilization

sudo ls /sys/devices/platform/soc/5000000.qcom,kgsl-3d0/kgsl/kgsl-3d0

Snapdragon 800 series image image image

  • 目前已知pytorch的 speed_benchmark_torch nnapi后端能够使用hexagon dsp来运行mobilenetv2的uint8量化版本

Finally, I would like to share the insights we gained by running these models on various mobile devices. The following table summarizes which hardware is selected when the models are executed on the mobile devices we experimented with (8 in total). This shows that the GPU is used in all mobile devices to accelerate the models with Float32. On the other hand, the hardware used depends on the mobile device for the models with Int8. Strictly speaking, the hardware is selected for each operation, but all operations can be executed on the listed hardware for MobileNetV2. image

sudo ls /sys/devices/platform/soc/3d00000.qcom,kgsl-3d0/kgsl/kgsl-3d0/
sudo ls /sys/devices/platform/soc/3d00000.qcom,kgsl-3d0/devfreq/3d00000.qcom,kgsl-3d0/
sudo cat /sys/devices/platform/soc/3d00000.qcom,kgsl-3d0/devfreq/3d00000.qcom,kgsl-3d0/name
sudo ls /sys/class/kgsl/kgsl-3d0/
sudo cat /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage

What are Qualcomm QNN HTP/DSP Delegates? | AI Benchmark Forum (ai-benchmark.net)

It is only available to members of "verified companies". It's not clear yet which companies will be verified, but it's obvious that free and hobbyist developers are ruled out. Since most android apps originate from those, this will be very detrimental for mobile AI. Moreover, this means that you buy a powerful NPU for a lot of money and then are not allowed to use it (particularly since Qualcomm has obviously stopped to support NNAPI on the NPU in the 8gen2 chipset, see https://browser.geekbench.com/search?k=ml_inference&q=kalama , where NNAPI has roughly same performance as CPU and is much worse than on 8gen1 chipset).

高通!🖕️

image

performance first glance

  • tensorflow lite integrated benchmark tool
wget https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/android_aarch64_benchmark_model
chmod +x android_aarch64_benchmark_model
wget https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/android_aarch64_benchmark_model_plus_flex
chmod +x android_aarch64_benchmark_model_plus_flex
wget https://github.com/ARM-software/ML-zoo/blob/master/models/image_classification/mobilenet_v2_1.0_224/tflite_uint8/mobilenet_v2_1.0_224_quantized_1_default_1.tflite\?raw\=true -O mobilenet_v2_1.0_224_quantized_1_default_1.tflite
# for arm mali gpu
cp /vendor/lib64/egl/libGLES_mali.so libOpenCL.so
## for samsung Exynos
# cp /vendor/lib64/libion_exynos.so .
LD_LIBRARY_PATH=$PWD ./android_aarch64_benchmark_model --graph=mobilenet_v2_1.0_224_quantized_1_default_1.tflite --num_runs=200 --use_gpu=true
# for qualcomm adreno gpu
LD_LIBRARY_PATH=/vendor/lib64 ./android_aarch64_benchmark_model --graph=mobilenet_v2_1.0_224_quantized_1_default_1.tflite --num_runs=200 --use_gpu=true
  • pytorch mobile integrated benchmark tool

参见 https://github.com/YingkunZhou/EdgeTransformerPerf/wiki/pytorch

Utils

⚠️ **GitHub.com Fallback** ⚠️