Quantize Callback
Quantize your network during training
Overview
The QuantizeCallback enables Quantization-Aware Training (QAT) within the fastai training loop. QAT simulates quantization effects during training, allowing the model to adapt its weights for better accuracy after quantization.
Why use QAT over post-training quantization? - Higher accuracy on the quantized model - Model learns to be robust to quantization noise - Especially beneficial for models sensitive to precision loss
Trade-offs: - Requires retraining (not just calibration) - Training is slower due to simulated quantization - Only for situations where you can afford additional training time
Parameters:
quantizer: Optional customQuantizerinstance for advanced configurationbackend: Target backend ('x86','qnnpack') - only used if quantizer not provideduse_per_tensor: Force per-tensor quantization to avoid conversion issuesverbose: Enable detailed output during QAT
Usage Example
from fasterai.quantize.quantize_callback import QuantizeCallback
# Basic QAT with default settings
cb = QuantizeCallback(backend='x86', verbose=True)
# Train with QAT
learn.fit(5, cbs=[cb])
# After training, the quantized model is available at:
quantized_model = learn.quantized_modelQAT Workflow
- before_fit: Model is prepared for QAT (fake quantization nodes inserted)
- Training: Model trains with simulated quantization effects
- after_fit: Model is converted to fully quantized form
The final learn.model is the quantized model ready for CPU inference.
See Also
- Quantizer - Core quantization class with backend/method options
- ONNX Exporter - Export quantized models for deployment
- PyTorch Quantization Docs - Official PyTorch guide