Distillation - BKJackson/BKJackson_Wiki GitHub Wiki
Classic reference
Distilling the Knowledge in a Neural Network - Geoffrey Hinton, Oriol Vinyals, and Jeff Dean (2015)
Distillation in Computer Vision
Knowledge Distillation - Knowledge Distillation is a procedure for model compression, in which a small (student) model is trained to match a large pre-trained (teacher) model. Knowledge is transferred from the teacher model to the student by minimizing a loss function, aimed at matching softened teacher logits as well as ground-truth labels.
Distilling Vision Transformers - A distillation technique that is specific to transformer-based vision models.