AWS Inferentia - AshokBhat/notes GitHub Wiki

AWS Inferentia

Machine Learning inference chip
Salient points: High throughput, Low Latency, 100s of TOPS
Each chip supports up to 128 TOPS
Multiple data types: INT8, FP16, BFloat16
Multiple ML frameworks: TensorFlow, MXNet, PyTorch, ONNX

AWS Neuron SDK

SDK to deploy ML inference on Amazon EC2 Inf1 instances
Consists of a compiler, run-time, and profiling tools
Pre-installed in AWS Deep Learning AMIs, AWS Deep Learning Containers and Amazon SageMaker
Can also be installed in your custom environment without a framework

AWS EC2 Inf1

up to 16 AWS Inferentia chips
2nd generation Intel Xeon Scalable processors
up to 100 Gbps networking

Machine learning workflow

Building your model in one of the popular machine learning frameworks
Use GPU instances such as P3 or P3dn to train your model
Deploy your model on Inf1 instances by using AWS Neuron SDK

See also

[Groq]] ](/AshokBhat/notes/wiki/[[Habana-Labs) | Graphcore
Google TPU
[AWS EC2]] ](/AshokBhat/notes/wiki/[[AWS-Elastic-Inference)
[AWS Graviton]] ](/AshokBhat/notes/wiki/[[AWS-Inferentia)