Further optimize for CPU inference

Further optimize for CPU inference

Overview

The accelerate_model_for_cpu function applies optimizations to prepare a PyTorch model for efficient CPU inference. It combines several techniques:

  1. Channels-last memory format: Optimizes memory layout for CNN operations on CPU
  2. TorchScript compilation: JIT compiles the model for faster execution
  3. Mobile optimization: Applies optimize_for_mobile for operator fusion and other optimizations

When to use: - Deploying models on CPU-only servers - Edge deployment without GPU - After quantization for maximum CPU performance

Parameters:

  • model: The PyTorch model to optimize
  • example_input: A sample input tensor (used for tracing)

Returns: An optimized TorchScript model


Usage Example

from fasterai.misc.cpu_optimizer import accelerate_model_for_cpu
import torch

# Create example input matching your model's expected shape
example_input = torch.randn(1, 3, 224, 224)

# Optimize model for CPU inference
optimized_model = accelerate_model_for_cpu(model, example_input)

# Use the optimized model
with torch.no_grad():
    output = optimized_model(input_tensor)

Note: The returned model is a TorchScript model. Some dynamic Python features may not be supported.