Models in production - liniribeiro/machine_learning GitHub Wiki

When projecting ML systems, some non experimental people focus more in to the building of the model and less at the model's implementation and manutation:

During the development, the training is the bottleneck, but in production the inference is the bottleneck.
The research prioritize the high request rate and the production prioritize low latency. (time that took for the request answer)
an latency of more than 100ms can bring bad impact at the model performance