Models in production - liniribeiro/machine_learning GitHub Wiki

When projecting ML systems, some non experimental people focus more in to the building of the model and less at the model's implementation and manutation:

  • During the development, the training is the bottleneck, but in production the inference is the bottleneck.
  • The research prioritize the high request rate and the production prioritize low latency. (time that took for the request answer)
  • an latency of more than 100ms can bring bad impact at the model performance