Quantizer

Quantize your network

Overview

The Quantizer class provides model quantization capabilities to reduce model size and improve inference speed. Quantization converts floating-point weights and activations to lower precision integers (typically int8).

Supported Backends: - 'x86': Optimized for Intel CPUs (default) - 'qnnpack': Optimized for ARM CPUs (mobile devices) - 'fbgemm': Facebook’s quantization backend

Quantization Methods: - 'static': Post-training quantization with calibration data - best accuracy, requires representative data - 'dynamic': Runtime quantization without calibration - easier to use, slightly lower accuracy - 'qat': Quantization-aware training - highest accuracy, requires retraining

Note: PyTorch quantization produces CPU-only models. The quantized model will always run on CPU regardless of original device.

Choosing the Right Method

Method Accuracy Setup Effort When to Use
Static High Medium (needs calibration data) Production with representative dataset available
Dynamic Medium Low (no calibration) Quick experiments, NLP models with variable input
QAT Highest High (requires retraining) Maximum accuracy critical, have training resources

Backend Selection Guide

Backend Target Hardware Best For
'x86' Intel/AMD CPUs Desktop/server deployment
'qnnpack' ARM CPUs Mobile (iOS/Android), Raspberry Pi
'fbgemm' Intel CPUs Server-side with batch inference

Parameters:

  • backend: Target hardware backend ('x86', 'qnnpack', 'fbgemm')
  • method: Quantization approach ('static', 'dynamic', 'qat')
  • qconfig_mapping: Optional custom quantization configuration
  • custom_configs: Dict of module-specific configurations
  • use_per_tensor: Force per-tensor quantization (may help with conversion issues)
  • verbose: Enable detailed output during quantization


Usage Examples

Dynamic Quantization (No calibration needed)

from fasterai.quantize.quantizer import Quantizer

# Create quantizer for dynamic quantization
quantizer = Quantizer(
    backend='x86',
    method='dynamic'
)

# Quantize - no dataloader needed
quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)

Mobile Deployment (ARM devices)

```python from fasterai.quantize.quantizer import Quantizer

Use qnnpack backend for mobile

quantizer = Quantizer( backend=‘qnnpack’, method=‘static’ )

quantized_model = quantizer.quantize(model, calibration_dl=dls.valid)


See Also