Distillation Losses

Knowledge distillation loss functions

Overview

This module provides loss functions for knowledge distillation. These losses enable training a smaller “student” network to mimic a larger “teacher” network.

Loss Categories: - Output-based: SoftTarget, Logits, Mutual - compare final predictions - Feature-based: Attention, FitNet, Similarity, ActivationBoundaries - compare intermediate representations

Output-Based Losses

These losses compare the final output predictions between student and teacher networks.

Feature-Based Losses

These losses compare intermediate feature representations, enabling the student to learn internal representations similar to the teacher.

Loss	Best For	Complexity
SoftTarget	General distillation, logit matching	Low
Attention	When attention patterns matter	Low
FitNet	Intermediate feature matching	Medium
PKT	Probability distribution matching	Medium
RKD	Relational knowledge transfer	High

Distillation Losses

Overview

Output-Based Losses

Feature-Based Losses

See Also

Loss Selection Guide