Distillation - BKJackson/BKJackson_Wiki GitHub Wiki

Classic reference

Distilling the Knowledge in a Neural Network - Geoffrey Hinton, Oriol Vinyals, and Jeff Dean (2015)

Distillation in Computer Vision

Knowledge Distillation - Knowledge Distillation is a procedure for model compression, in which a small (student) model is trained to match a large pre-trained (teacher) model. Knowledge is transferred from the teacher model to the student by minimizing a loss function, aimed at matching softened teacher logits as well as ground-truth labels.

Distilling Vision Transformers - A distillation technique that is specific to transformer-based vision models.