Models in production - liniribeiro/machine_learning GitHub Wiki
When projecting ML systems, some non experimental people focus more in to the building of the model and less at the model's implementation and manutation:
- During the development, the training is the bottleneck, but in production the inference is the bottleneck.
- The research prioritize the high request rate and the production prioritize low latency. (time that took for the request answer)
- an latency of more than 100ms can bring bad impact at the model performance