nvidia tensorrt - YingkunZhou/EdgeTransformerBench GitHub Wiki
- 最新的commit的需要torch的版本是2.3~2.4,而且还是要带cuda的torch,英伟达官方最多只提供到2.2,所以你要我到哪里找?
- 2.2版本的torch-tensorrt,pip --no-deps能装上,但是import后就有未识别的符号,而且google不到解决方案,另外2.x的版本明确说了,别人支持的是cuda-12.x,而不是我现在仅有的cuda-11.4
所以只能用1.4版本的了
Installation — Torch-TensorRT documentation (pytorch.org)
github的repo和pytorch的网页都有编译命令,选了一个靠谱的就行,主要是修改WORKSPACE的内容。
TensorRT/docsrc/user_guide/saving_models.rst at main · pytorch/TensorRT (github.com)
In Torch-TensorRT 1.X versions, the primary way to compile and run inference with Torch-TensorRT is using Torchscript IR. For ir=ts, this behavior stays the same in 2.X versions as well.
Dynamo IR
The output type of ir=dynamo compilation of Torch-TensorRT is torch.export.ExportedProgram object by default. In addition, we provide a new parameter output_format in the CompilationSetting object provided before compilation. The output_format can take the following options
- exported_program (or) ep : This is the default. Returns an ExportedProgram
- torchscript (or) ts : This returns a TorchScript module
- graph_module (or) fx : This returns a torch.fx.GraphModule which can be traced into Torchscript to save to disk.
export CC=/usr/bin/gcc
export CXX=/usr/bin/g++
./build.sh --config Release --build_shared_lib --parallel --skip_tests --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda --use_tensorrt --tensorrt_home /usr/local/cuda --cmake_extra_defines onnxruntime_DISABLE_FLOAT8_TYPES=ON
/media/loongson/phd19/home/zhou/graduate9/work/update/onnxruntime/onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cu(78): error: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function
# 什么鬼嘛
diff --git a/onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cu b/onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cu
index 098e361..98290db 100644
--- a/onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cu
+++ b/onnxruntime/contrib_ops/cuda/quantization/matmul_bnb4.cu
@@ -75,7 +75,7 @@ __global__ void kgemm_4bit_inference_naive(
uint8_t local_B_4bit[num_values_8bit];
T local_B[num_values_4bit / 4];
T local_A[num_values_4bit / 4];
- __shared__ T quant_map[16];
+ T quant_map[16];
T local_absmax = T(0.0f);
for (int i = threadIdx.x; i < 16; i++) quant_map[i] = T(datatype[i]);
DLA int8
Model | Top-1 | Top-1 //20 est. |
Top-1 //50 est. |
#params | GMACs |
---|---|---|---|---|---|
efficientformerv2_s0 | - | 28.1 | 25.7 | 3.5M | 0.40G |
efficientformerv2_s1 | - | 0 | 0 | 6.1M | 0.65G |
efficientformerv2_s2 | - | 73.3 | 70.3 | 12.6M | 1.25G |
SwiftFormer_XS | - | 65.8 | 64.4 | 3.5M | 0.4G |
SwiftFormer_S | - | 11.0 | 4.6 | 6.1M | 1.0G |
SwiftFormer_L1 | - | 40.2 | 42.5 | 12.1M | 1.6G |
EMO_1M | - | 51.5 | 46.1 | 1.3M | 0.26G |
EMO_2M | - | 44.3 | 32.9 | 2.3M | 0.44G |
EMO_5M | - | 64.1 | 56.5 | 5.1M | 0.90G |
EMO_6M | - | 63.2 | 60.0 | 6.1M | 0.96G |
edgenext_xx_small | - | 68.4 | 67.3 | 1.3M | 0.26G |
edgenext_x_small | - | 75.1 | 73.7 | 2.3M | 0.54G |
edgenext_small/usi | - | 80.3 | 80.2 | 5.6M | 1.26G |
mobilevitv2_050* | - | 6.7 | 6.9 | 1.4M | 0.5G |
mobilevitv2_075* | - | 56.8 | 49.0 | 2.9M | 1.0G |
mobilevitv2_100* | - | 56.4 | 49.2 | 4.9M | 1.8G |
mobilevitv2_125* | - | 63.2 | 60.3 | 7.5M | 2.8G |
mobilevitv2_150* | - | 47.5 | 36.5 | 10.6M | 4.0G |
mobilevitv2_175* | - | 68.6 | 64.2 | 14.3M | 5.5G |
mobilevitv2_200* | - | 73.0 | 71.9 | 18.4M | 7.2G |
mobilevit_xx_small | - | 19.2 | 20.7 | 1.3M | 0.36G |
mobilevit_x_small | - | 56.8 | 57.7 | 2.3M | 0.89G |
mobilevit_small | - | 64.6 | 66.7 | 5.6M | 2.0 G |
LeViT_128S | - | 76.2 | 75.9 | 7.8M | 0.30G |
LeViT_128 | - | 78.4 | 77.4 | 9.2M | 0.41G |
LeViT_192 | - | 79.8 | 80.0 | 11 M | 0.66G |
LeViT_256 | - | 80.6 | 81.6 | 19 M | 1.12G |
resnet50 | - | 5.8 | 5.8 | 25.6M | 4.1G |
mobilenetv3_large_100 | - | 65.2 | 61.0 | 5.5M | 0.29G |
tf_efficientnetv2_b0 | - | 75.8 | 75.8 | 7.1M | 0.72G |
tf_efficientnetv2_b1 | - | 75.7 | 76.5 | 8.1M | 1.2G |
tf_efficientnetv2_b2 | - | 76.6 | 76.3 | 10.1M | 1.7G |
tf_efficientnetv2_b3 | - | 79.8 | 80.4 | 14.4M | 3.0G |
use onnxruntime tensorrt backend to get accuracy
GPU int8
Model | Top-1 | Top-1 //20 est. |
Top-1 //50 est. |
#params | GMACs |
---|---|---|---|---|---|
efficientformerv2_s0 | - | 30.8 | 28.6 | 3.5M | 0.40G |
efficientformerv2_s1 | - | 0.0 | 0.0 | 6.1M | 0.65G |
efficientformerv2_s2 | - | 74.0 | 71.5 | 12.6M | 1.25G |
SwiftFormer_XS | - | 66.2 | 65.2 | 3.5M | 0.4G |
SwiftFormer_S | - | 25.6 | 20.5 | 6.1M | 1.0G |
SwiftFormer_L1 | - | 49.2 | 46.4 | 12.1M | 1.6G |
EMO_1M | - | 61.4 | 56.8 | 1.3M | 0.26G |
EMO_2M | - | 63.4 | 59.3 | 2.3M | 0.44G |
EMO_5M | - | 71.6 | 71.5 | 5.1M | 0.90G |
EMO_6M | - | 72.7 | 70.9 | 6.1M | 0.96G |
edgenext_xx_small | - | 71.1 | 70.5 | 1.3M | 0.26G |
edgenext_x_small | - | 74.5 | 74.7 | 2.3M | 0.54G |
edgenext_small/usi | - | 80.6 | 79.7 | 5.6M | 1.26G |
mobilevitv2_050* | - | 11.5 | 9.0 | 1.4M | 0.5G |
mobilevitv2_075* | - | 61.4 | 54.7 | 2.9M | 1.0G |
mobilevitv2_100* | - | 58.2 | 51.1 | 4.9M | 1.8G |
mobilevitv2_125* | - | 65.8 | 59.2 | 7.5M | 2.8G |
mobilevitv2_150* | - | 42.5 | 33.3 | 10.6M | 4.0G |
mobilevitv2_175* | - | 70.0 | 63.3 | 14.3M | 5.5G |
mobilevitv2_200* | - | 74.1 | 73.1 | 18.4M | 7.2G |
mobilevit_xx_small | - | 27.9 | 24.6 | 1.3M | 0.36G |
mobilevit_x_small | - | 54.2 | 56.3 | 2.3M | 0.89G |
mobilevit_small | - | 71.2 | 72.2 | 5.6M | 2.0 G |
LeViT_128S | - | 76.1 | 75.7 | 7.8M | 0.30G |
LeViT_128 | - | 78.5 | 77.4 | 9.2M | 0.41G |
LeViT_192 | - | 79.9 | 79.7 | 11 M | 0.66G |
LeViT_256 | - | 80.5 | 81.3 | 19 M | 1.12G |
resnet50 | - | 77.7 | 79.6 | 25.6M | 4.1G |
mobilenetv3_large_100 | - | 67.6 | 65.1 | 5.5M | 0.29G |
tf_efficientnetv2_b0 | - | 76.1 | 75.0 | 7.1M | 0.72G |
tf_efficientnetv2_b1 | - | 75.8 | 77.1 | 8.1M | 1.2G |
tf_efficientnetv2_b2 | - | 77.0 | 76.8 | 10.1M | 1.7G |
tf_efficientnetv2_b3 | - | 79.9 | 81.3 | 14.4M | 3.0G |
- use onnxruntime tensorrt backend to get accuracy