ONNX Exporter

Export PyTorch models to ONNX format with optional INT8 quantization

ONNX Export

Export PyTorch models to ONNX format for deployment. Supports: - Basic ONNX export with graph optimization - Dynamic INT8 quantization (no calibration needed) - Static INT8 quantization (with calibration data) - Output verification against original model

Export Function

Inference Wrapper

Verification

Usage Examples

from fasterai.export.all import export_onnx, ONNXModel, verify_onnx

# Basic export
path = export_onnx(model, sample, "model.onnx")

# With quantization
path = export_onnx(model, sample, "model.onnx", quantize=True)

# Inference
onnx_model = ONNXModel("model.onnx")
output = onnx_model(input_tensor)

# Verify
assert verify_onnx(model, "model.onnx", sample)
/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/cuda/__init__.py:182: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:119.)
  return torch._C._cuda_getDeviceCount() > 0
W0202 11:00:43.171000 421313 site-packages/torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.8'

Found permutation search CUDA kernels [ASP][Info] permutation_search_kernels can be imported. —

source

export_onnx


def export_onnx(
    model:nn.Module, # PyTorch model to export
    sample:torch.Tensor, # Example input for tracing (with batch dim)
    output_path:str | Path, # Output .onnx file path
    opset_version:int=18, # ONNX opset version
    quantize:bool=False, # Apply INT8 quantization after export
    quantize_mode:str='dynamic', # "dynamic" (no calibration) or "static"
    calibration_data:Iterable | None=None, # DataLoader for static quantization
    optimize:bool=True, # Run ONNX graph optimizer
    dynamic_batch:bool=True, # Allow variable batch size at runtime
    input_names:list[str] | None=None, # Names for input tensors
    output_names:list[str] | None=None, # Names for output tensors
)->Path:

Export a PyTorch model to ONNX format with optional quantization


source

ONNXModel


def ONNXModel(
    path:str | Path, device:str='cpu'
):

Wrapper for ONNX Runtime inference with PyTorch-like interface


source

verify_onnx


def verify_onnx(
    model:nn.Module, # Original PyTorch model
    onnx_path:str | Path, # Path to exported ONNX model
    sample:torch.Tensor, # Test input tensor
    rtol:float=0.001, # Relative tolerance
    atol:float=1e-05, # Absolute tolerance
)->bool:

Verify ONNX model outputs match PyTorch model within tolerance