TensorRT - AshokBhat/notes GitHub Wiki

About

  • Nvidia's inference engine
  • A high-performance neural network inference optimizer and runtime engine for production deployment.

Contents

  • A C++ library with Python and C++ API
  • Facilitates high-performance inference on NVIDIA graphics processing units (GPUs).
  • Designed to work in a complementary fashion with training frameworks such as TensorFlow, MXNet, etc

Usage

  • As a part of TensorFlow, which has have integrated TensorRT
  • As a library within a user application.

TensorFlow with TensorRT (TF-TRT)

TensorFlow with TensorRT (TF-TRT) optimizations

  • Layers with unused output are eliminated
  • Convolution, bias, and ReLU layers are fused to form a single layer, wherever possible.
  • Horizontal layer fusion (or layer aggregation) along with the required division of aggregated layers to their respective output. Horizontal layer fusion improves performance by combining layers that take the same source tensor and apply the same operations with similar parameters.

See also