AWS Inferentia - AshokBhat/ml GitHub Wiki

AWS Inferentia

AWS Neuron SDK

  • SDK to deploy ML inference on Amazon EC2 Inf1 instances
  • Consists of a compiler, run-time, and profiling tools
  • Pre-installed in AWS Deep Learning AMIs, AWS Deep Learning Containers and Amazon SageMaker
  • Can also be installed in your custom environment without a framework

AWS EC2 Inf1

  • up to 16 AWS Inferentia chips
  • 2nd generation Intel Xeon Scalable processors
  • up to 100 Gbps networking

Users

  • AirBnB - PyTorch NLP BERT Models - ChatBot - 2x improvement in throughput
  • AutoDesk - AI-powered virtual assistant - 4.9x higher throughput over G4dn for NLU models
  • Snap - Recommendation models
  • Sprinklr - natural language processing (NLP) and computer vision

Amazon users

Amazon Advertising

  • Text ad processing - PyTorch based BERT - From GPUs
  • Image ad processing models

Amazon Alexa

  • Text-to-speech - lower inference latency and cost-per-inference
  • Web-based question answering (WBQA) workloads
    • Tensorflow-based model
    • from GPU-based P3 instances
    • inference costs by 60%
    • end-to-end latency by more than 40%

Amazon Rekognition

  • Object classification models
  • 8X lower latency, and 2X throughput compared to GPUs

Machine learning workflow

  • Building your model in one of the popular machine learning frameworks
  • Use GPU instances such as P3 or P3dn to train your model
  • Deploy your model on Inf1 instances by using AWS Neuron SDK

See also

  • [Groq]] ](/AshokBhat/ml/wiki/[[Habana-Labs) | Graphcore
  • Google TPU
  • [AWS EC2]] ](/AshokBhat/ml/wiki/[[AWS-Elastic-Inference)
  • [AWS Graviton]] ](/AshokBhat/ml/wiki/[[AWS-Inferentia)