Quantize Callback

Quantize your network during training

Overview

The QuantizeCallback enables Quantization-Aware Training (QAT) within the fastai training loop. QAT simulates quantization effects during training, allowing the model to adapt its weights for better accuracy after quantization.

Why use QAT over post-training quantization? - Higher accuracy on the quantized model - Model learns to be robust to quantization noise - Especially beneficial for models sensitive to precision loss

Trade-offs: - Requires retraining (not just calibration) - Training is slower due to simulated quantization - Only for situations where you can afford additional training time

Parameters:

  • quantizer: Optional custom Quantizer instance for advanced configuration
  • backend: Target backend ('x86', 'qnnpack') - only used if quantizer not provided
  • use_per_tensor: Force per-tensor quantization to avoid conversion issues
  • verbose: Enable detailed output during QAT

Usage Example

from fasterai.quantize.quantize_callback import QuantizeCallback

# Basic QAT with default settings
cb = QuantizeCallback(backend='x86', verbose=True)

# Train with QAT
learn.fit(5, cbs=[cb])

# After training, the quantized model is available at:
quantized_model = learn.quantized_model

QAT Workflow

  1. before_fit: Model is prepared for QAT (fake quantization nodes inserted)
  2. Training: Model trains with simulated quantization effects
  3. after_fit: Model is converted to fully quantized form

The final learn.model is the quantized model ready for CPU inference.


See Also