TensorRT - AshokBhat/notes GitHub Wiki

About

Nvidia's inference engine
A high-performance neural network inference optimizer and runtime engine for production deployment.

Contents

A C++ library with Python and C++ API
Facilitates high-performance inference on NVIDIA graphics processing units (GPUs).
Designed to work in a complementary fashion with training frameworks such as TensorFlow, MXNet, etc

Usage

As a part of TensorFlow, which has have integrated TensorRT
As a library within a user application.

TensorFlow with TensorRT (TF-TRT)

TensorFlow with TensorRT (TF-TRT) optimizations

Layers with unused output are eliminated
Convolution, bias, and ReLU layers are fused to form a single layer, wherever possible.
Horizontal layer fusion (or layer aggregation) along with the required division of aggregated layers to their respective output. Horizontal layer fusion improves performance by combining layers that take the same source tensor and apply the same operations with similar parameters.

See also

Frameworks: [TensorFlow]] ](/AshokBhat/notes/wiki/[PyTorch) | [MXNET]] | ONNX
[Inference engine]]s: [[ArmNN]] ](/AshokBhat/notes/wiki/[OpenVINO) | [TensorRT]] | Core ML