AWS Inferentia - AshokBhat/ml GitHub Wiki
AWS Inferentia
- SDK to deploy ML inference on
Amazon EC2 Inf1
instances
- Consists of a compiler, run-time, and profiling tools
- Pre-installed in
AWS Deep Learning AMIs
, AWS Deep Learning Containers
and Amazon SageMaker
- Can also be installed in your custom environment without a framework
AWS EC2 Inf1
- up to 16 AWS Inferentia chips
- 2nd generation Intel Xeon Scalable processors
- up to 100 Gbps networking
Users
- AirBnB - PyTorch NLP BERT Models - ChatBot - 2x improvement in throughput
- AutoDesk - AI-powered virtual assistant - 4.9x higher throughput over G4dn for NLU models
- Snap - Recommendation models
- Sprinklr - natural language processing (NLP) and computer vision
Amazon users
Amazon Advertising
- Text ad processing - PyTorch based BERT - From GPUs
- Image ad processing models
Amazon Alexa
- Text-to-speech - lower inference latency and cost-per-inference
- Web-based question answering (WBQA) workloads
- Tensorflow-based model
- from GPU-based P3 instances
- inference costs by 60%
- end-to-end latency by more than 40%
Amazon Rekognition
- Object classification models
- 8X lower latency, and 2X throughput compared to GPUs
Machine learning workflow
- Building your model in one of the popular machine learning frameworks
- Use GPU instances such as
P3
or P3dn
to train your model
- Deploy your model on
Inf1
instances by using AWS Neuron SDK
See also
- [Groq]] ](/AshokBhat/ml/wiki/[[Habana-Labs) | Graphcore
- Google TPU
- [AWS EC2]] ](/AshokBhat/ml/wiki/[[AWS-Elastic-Inference)
- [AWS Graviton]] ](/AshokBhat/ml/wiki/[[AWS-Inferentia)