Quantizer
Quantize your network
Overview
The Quantizer class provides model quantization capabilities to reduce model size and improve inference speed. It supports two backends: the legacy torch.ao.quantization (FX graph mode) and the modern torchao library.
Backend Selection Guide
| Backend | Bit Widths | Target Layers | Best For |
|---|---|---|---|
'x86' |
INT8 | Conv2d + Linear | CNN deployment on Intel/AMD CPUs |
'qnnpack' |
INT8 | Conv2d + Linear | Mobile (ARM) deployment |
'torchao' |
INT8 | Linear (primary) | Transformers, MLPs, modern models |
Method Selection Guide
Legacy backends (x86, qnnpack, fbgemm):
| Method | Needs Calibration | When to Use |
|---|---|---|
'static' |
Yes | Best accuracy for CNNs |
'dynamic' |
No | Quick experiments, RNNs |
'qat' |
Training | Maximum accuracy critical |
torchao backend:
| Method | Needs Calibration | When to Use |
|---|---|---|
'int8_weight_only' |
No | General purpose, good default |
'int8_dynamic' |
No | Activation + weight quantization |
Quantizer
def Quantizer(
backend:str='x86', # Target backend: 'x86', 'qnnpack', 'fbgemm', or 'torchao'
method:str='static', # Method: 'static', 'dynamic', 'qat', 'int8_weight_only', 'int4_weight_only', 'int8_dynamic'
qconfig_mapping:dict | None=None, # Optional custom quantization config (legacy backends only)
custom_configs:dict | None=None, # Custom module-specific configurations
use_per_tensor:bool=False, # Force per-tensor quantization (legacy backends only)
verbose:bool=False, # Enable verbose output
):
Initialize a quantizer with specified backend and options.
Parameters:
backend: Target hardware backend ('x86','qnnpack','fbgemm')method: Quantization approach ('static','dynamic','qat')qconfig_mapping: Optional custom quantization configurationcustom_configs: Dict of module-specific configurationsuse_per_tensor: Force per-tensor quantization (may help with conversion issues)verbose: Enable detailed output during quantization
Quantizer.quantize
def quantize(
model:Module, # Model to quantize
calibration_dl:Any=None, # Dataloader for calibration (not needed for torchao weight-only)
max_calibration_samples:int=100, # Maximum number of samples to use for calibration
device:str | torch.device='cpu', # Device to use for calibration
)->Module:
Quantize a model using the specified backend and method.
Usage Examples
Static Quantization (Recommended for best accuracy)
from fasterai.quantize.quantizer import Quantizer
# Create quantizer for static quantization
quantizer = Quantizer(
backend='x86',
method='static',
verbose=True
)
# Quantize with calibration data
quantized_model = quantizer.quantize(
model,
calibration_dl=dls.valid,
max_calibration_samples=100
)Dynamic Quantization (No calibration needed)
from fasterai.quantize.quantizer import Quantizer
# Create quantizer for dynamic quantization
quantizer = Quantizer(
backend='x86',
method='dynamic'
)
# Quantize - no dataloader needed
quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)Mobile Deployment (ARM devices)
```python from fasterai.quantize.quantizer import Quantizer
Use qnnpack backend for mobile
quantizer = Quantizer( backend=‘qnnpack’, method=‘static’ )
quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)
See Also
- QuantizeCallback - Apply quantization during fastai training
- PyTorch Quantization Documentation - Official PyTorch quantization guide
- ONNX Exporter - Export models for cross-platform deployment