Distillation Losses

Knowledge distillation loss functions

Overview

This module provides loss functions for knowledge distillation. These losses enable training a smaller “student” network to mimic a larger “teacher” network.

Loss Categories: - Output-based: SoftTarget, Logits, Mutual - compare final predictions - Feature-based: Attention, FitNet, Similarity, ActivationBoundaries - compare intermediate representations

Output-Based Losses

These losses compare the final output predictions between student and teacher networks.


Feature-Based Losses

These losses compare intermediate feature representations, enabling the student to learn internal representations similar to the teacher.


See Also

Loss Selection Guide

Loss Best For Complexity
SoftTarget General distillation, logit matching Low
Attention When attention patterns matter Low
FitNet Intermediate feature matching Medium
PKT Probability distribution matching Medium
RKD Relational knowledge transfer High