Quantizer
Overview
The Quantizer class provides model quantization capabilities to reduce model size and improve inference speed. Quantization converts floating-point weights and activations to lower precision integers (typically int8).
Supported Backends: - 'x86': Optimized for Intel CPUs (default) - 'qnnpack': Optimized for ARM CPUs (mobile devices) - 'fbgemm': Facebook’s quantization backend
Quantization Methods: - 'static': Post-training quantization with calibration data - best accuracy, requires representative data - 'dynamic': Runtime quantization without calibration - easier to use, slightly lower accuracy - 'qat': Quantization-aware training - highest accuracy, requires retraining
Note: PyTorch quantization produces CPU-only models. The quantized model will always run on CPU regardless of original device.
Choosing the Right Method
| Method | Accuracy | Setup Effort | When to Use |
|---|---|---|---|
| Static | High | Medium (needs calibration data) | Production with representative dataset available |
| Dynamic | Medium | Low (no calibration) | Quick experiments, NLP models with variable input |
| QAT | Highest | High (requires retraining) | Maximum accuracy critical, have training resources |
Backend Selection Guide
| Backend | Target Hardware | Best For |
|---|---|---|
'x86' |
Intel/AMD CPUs | Desktop/server deployment |
'qnnpack' |
ARM CPUs | Mobile (iOS/Android), Raspberry Pi |
'fbgemm' |
Intel CPUs | Server-side with batch inference |
Parameters:
backend: Target hardware backend ('x86','qnnpack','fbgemm')method: Quantization approach ('static','dynamic','qat')qconfig_mapping: Optional custom quantization configurationcustom_configs: Dict of module-specific configurationsuse_per_tensor: Force per-tensor quantization (may help with conversion issues)verbose: Enable detailed output during quantization
Usage Examples
Static Quantization (Recommended for best accuracy)
from fasterai.quantize.quantizer import Quantizer
# Create quantizer for static quantization
quantizer = Quantizer(
backend='x86',
method='static',
verbose=True
)
# Quantize with calibration data
quantized_model = quantizer.quantize(
model,
calibration_dl=dls.valid,
max_calibration_samples=100
)Dynamic Quantization (No calibration needed)
from fasterai.quantize.quantizer import Quantizer
# Create quantizer for dynamic quantization
quantizer = Quantizer(
backend='x86',
method='dynamic'
)
# Quantize - no dataloader needed
quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)Mobile Deployment (ARM devices)
```python from fasterai.quantize.quantizer import Quantizer
Use qnnpack backend for mobile
quantizer = Quantizer( backend=‘qnnpack’, method=‘static’ )
quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)
See Also
- QuantizeCallback - Apply quantization during fastai training
- PyTorch Quantization Documentation - Official PyTorch quantization guide
- ONNX Exporter - Export models for cross-platform deployment