Distillation Losses
Knowledge distillation loss functions
Overview
This module provides loss functions for knowledge distillation. These losses enable training a smaller “student” network to mimic a larger “teacher” network.
Loss Categories: - Output-based: SoftTarget, Logits, Mutual - compare final predictions - Feature-based: Attention, FitNet, Similarity, ActivationBoundaries - compare intermediate representations
Output-Based Losses
These losses compare the final output predictions between student and teacher networks.
Feature-Based Losses
These losses compare intermediate feature representations, enabling the student to learn internal representations similar to the teacher.
See Also
- KnowledgeDistillationCallback - Apply these losses during training
- Distillation Tutorial - Practical examples with different losses
Loss Selection Guide
| Loss | Best For | Complexity |
|---|---|---|
| SoftTarget | General distillation, logit matching | Low |
| Attention | When attention patterns matter | Low |
| FitNet | Intermediate feature matching | Medium |
| PKT | Probability distribution matching | Medium |
| RKD | Relational knowledge transfer | High |