Quantizer

Quantize your network

Overview

The Quantizer class provides model quantization capabilities to reduce model size and improve inference speed. It supports two backends: the legacy torch.ao.quantization (FX graph mode) and the modern torchao library.

Backend Selection Guide

Backend Bit Widths Target Layers Best For
'x86' INT8 Conv2d + Linear CNN deployment on Intel/AMD CPUs
'qnnpack' INT8 Conv2d + Linear Mobile (ARM) deployment
'torchao' INT8 Linear (primary) Transformers, MLPs, modern models

Method Selection Guide

Legacy backends (x86, qnnpack, fbgemm):

Method Needs Calibration When to Use
'static' Yes Best accuracy for CNNs
'dynamic' No Quick experiments, RNNs
'qat' Training Maximum accuracy critical

torchao backend:

Method Needs Calibration When to Use
'int8_weight_only' No General purpose, good default
'int8_dynamic' No Activation + weight quantization

source

Quantizer


def Quantizer(
    backend:str='x86', # Target backend: 'x86', 'qnnpack', 'fbgemm', or 'torchao'
    method:str='static', # Method: 'static', 'dynamic', 'qat', 'int8_weight_only', 'int4_weight_only', 'int8_dynamic'
    qconfig_mapping:dict | None=None, # Optional custom quantization config (legacy backends only)
    custom_configs:dict | None=None, # Custom module-specific configurations
    use_per_tensor:bool=False, # Force per-tensor quantization (legacy backends only)
    verbose:bool=False, # Enable verbose output
):

Initialize a quantizer with specified backend and options.

Parameters:

  • backend: Target hardware backend ('x86', 'qnnpack', 'fbgemm')
  • method: Quantization approach ('static', 'dynamic', 'qat')
  • qconfig_mapping: Optional custom quantization configuration
  • custom_configs: Dict of module-specific configurations
  • use_per_tensor: Force per-tensor quantization (may help with conversion issues)
  • verbose: Enable detailed output during quantization


source

Quantizer.quantize


def quantize(
    model:Module, # Model to quantize
    calibration_dl:Any=None, # Dataloader for calibration (not needed for torchao weight-only)
    max_calibration_samples:int=100, # Maximum number of samples to use for calibration
    device:str | torch.device='cpu', # Device to use for calibration
)->Module:

Quantize a model using the specified backend and method.


Usage Examples

Dynamic Quantization (No calibration needed)

from fasterai.quantize.quantizer import Quantizer

# Create quantizer for dynamic quantization
quantizer = Quantizer(
    backend='x86',
    method='dynamic'
)

# Quantize - no dataloader needed
quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)

Mobile Deployment (ARM devices)

```python from fasterai.quantize.quantizer import Quantizer

Use qnnpack backend for mobile

quantizer = Quantizer( backend=‘qnnpack’, method=‘static’ )

quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)


See Also