Triton Inference Server - AshokBhat/ml GitHub Wiki

About

  • Open-source inference serving software
  • To deploy AI models

Hardware support

  • README says Triton is optimized to provide the best inferencing performance by using GPUs, but it can also work on CPU-only systems

Backends

TensorRT

PyTorch

ONNX Runtime

TensorFlow

  • On Jetsons, does not use docker images
  • On non-Jetson systems, use docker images (with NGC TensorFlow docker used by default)

TFLite backend (armnn_tflite)

OpenVINO

Block diagram

See also