Benchmark Tutorial

How to use fasterbench

Overview

fasterbench is a comprehensive benchmarking toolkit for PyTorch models. It measures five critical dimensions of model performance:

Metric	What It Measures	Why It Matters
Size	Disk size, parameter count	Deployment storage, download time
Speed	Latency, throughput	User experience, serving costs
Compute	MACs (operations)	Hardware requirements, energy use
Memory	Peak/average memory	GPU memory limits, batch sizes
Energy	Power, CO₂ emissions	Operating costs, sustainability

Key Features

Typed results - BenchmarkResult with IDE autocomplete
Backward compatible - Dict-like access for existing code
Selective metrics - Benchmark only what you need
Multi-device - CPU, CUDA, and multi-GPU support
Export formats - DataFrame, JSON, summary reports

1. Basic Benchmarking

The benchmark() function is the main entry point:

import torch
from torchvision.models import resnet18, mobilenet_v3_large, efficientnet_b0
from fasterbench import benchmark, BenchmarkResult

# Create sample input
dummy = torch.randn(1, 3, 224, 224)

# Benchmark a model (using fast metrics for demo)
result = benchmark(resnet18(), dummy, metrics=["size", "speed", "compute"])
result

═══ Size ════════════════════════════════════
  Disk:   44.67 MiB
  Params: 11.69M
═══ Speed ═══════════════════════════════════
  cpu: 33.59 ms  │  29.8 inf/s  │  p99: 61.68 ms
  cuda: 0.64 ms  │  1562.1 inf/s  │  p99: 0.66 ms
═══ Compute ═════════════════════════════════
  MACs: 1824.0 M

2. Typed Access

BenchmarkResult provides typed access to all metrics with IDE autocomplete:

# Size metrics
print(f"Disk size: {result.size.size_mib:.2f} MiB")
print(f"Parameters: {result.size.num_params:,}")

# Speed metrics (keyed by device)
print(f"\nCPU latency: {result.speed['cpu'].mean_ms:.2f} ms")
print(f"CPU throughput: {result.speed['cpu'].throughput_s:.1f} inf/s")
print(f"CPU p99 latency: {result.speed['cpu'].p99_ms:.2f} ms")

# Compute metrics
print(f"\nMACs: {result.compute.macs_m:.1f} M")
print(f"MACs available: {result.compute.macs_available}")

Disk size: 44.67 MiB
Parameters: 11,689,512

CPU latency: 33.59 ms
CPU throughput: 29.8 inf/s
CPU p99 latency: 61.68 ms

MACs: 1824.0 M
MACs available: True

3. Backward-Compatible Dict Access

For compatibility with existing code, BenchmarkResult also supports dict-like access:

# Dict-style access
print(result["size_size_mib"])
print(result["speed_cpu_mean_ms"])

# Iteration
print("\nAll metrics:")
for key, value in result.items():
    print(f"  {key}: {value}")

44.66535472869873
33.58930969238281

All metrics:
  size_disk_bytes: 46835019
  size_size_mib: 44.66535472869873
  size_num_params: 11689512
  speed_cpu_p50_ms: 29.3856258392334
  speed_cpu_p90_ms: 37.92195129394532
  speed_cpu_p99_ms: 61.68484344482422
  speed_cpu_mean_ms: 33.58930969238281
  speed_cpu_std_ms: 11.55258846282959
  speed_cpu_throughput_s: 29.77137693981172
  speed_cuda_p50_ms: 0.6376160085201263
  speed_cuda_p90_ms: 0.6466431796550751
  speed_cuda_p99_ms: 0.6631737768650058
  speed_cuda_mean_ms: 0.6401805281639099
  speed_cuda_std_ms: 0.009303152561187744
  speed_cuda_throughput_s: 1562.0593816998492
  compute_macs_m: 1824.034

4. Comparing Multiple Models

Export results to a pandas DataFrame for easy comparison:

import pandas as pd

# Benchmark multiple models
models = {
    "ResNet-18": resnet18(),
    "MobileNet-V3": mobilenet_v3_large(),
    "EfficientNet-B0": efficientnet_b0(),
}

results = {}
for name, model in models.items():
    results[name] = benchmark(model, dummy, metrics=["size", "speed", "compute"])

# Combine into DataFrame
rows = []
for name, r in results.items():
    row = {"model": name, **r.as_dict()}
    rows.append(row)

df = pd.DataFrame(rows)
df[["model", "size_size_mib", "size_num_params", "speed_cpu_mean_ms", "compute_macs_m"]]

	model	size_size_mib	size_num_params	speed_cpu_mean_ms	compute_macs_m
0	ResNet-18	44.665355	11689512	36.195747	1824.034
1	MobileNet-V3	21.107375	5483032	9.795851	234.838
2	EfficientNet-B0	20.453475	5288548	15.414814	415.145

5. JSON Export

Serialize results to JSON for logging or storage:

# Export to JSON
json_str = result.to_json()
print(json_str)

{
  "size_disk_bytes": 46835019,
  "size_size_mib": 44.66535472869873,
  "size_num_params": 11689512,
  "speed_cpu_p50_ms": 29.3856258392334,
  "speed_cpu_p90_ms": 37.92195129394532,
  "speed_cpu_p99_ms": 61.68484344482422,
  "speed_cpu_mean_ms": 33.58930969238281,
  "speed_cpu_std_ms": 11.55258846282959,
  "speed_cpu_throughput_s": 29.77137693981172,
  "speed_cuda_p50_ms": 0.6376160085201263,
  "speed_cuda_p90_ms": 0.6466431796550751,
  "speed_cuda_p99_ms": 0.6631737768650058,
  "speed_cuda_mean_ms": 0.6401805281639099,
  "speed_cuda_std_ms": 0.009303152561187744,
  "speed_cuda_throughput_s": 1562.0593816998492,
  "compute_macs_m": 1824.034
}

6. Human-Readable Summary

Get a quick overview of all metrics with summary():

result.summary()

═══ Size ════════════════════════════════════
  Disk:   44.67 MiB
  Params: 11.69M
═══ Speed ═══════════════════════════════════
  cpu: 33.59 ms  │  29.8 inf/s  │  p99: 61.68 ms
  cuda: 0.64 ms  │  1562.1 inf/s  │  p99: 0.66 ms
═══ Compute ═════════════════════════════════
  MACs: 1824.0 M

7. Selective Metrics

Only compute what you need for faster benchmarking:

quick_result = benchmark(resnet18(), dummy, metrics=["size", "compute"])
print(f"Size: {quick_result.size.size_mib:.2f} MiB")
print(f"MACs: {quick_result.compute.macs_m:.1f} M")
print(f"Speed measured: {bool(quick_result.speed)}")  # False - not requested

Size: 44.67 MiB
MACs: 1824.0 M
Speed measured: False

8. Visualizing with Radar Plots

Compare models visually using radar plots:

from fasterbench.plot import create_radar_plot

# Full benchmark for radar plot (includes energy)
dummy_batch = torch.randn(8, 3, 224, 224)

resnet_full = benchmark(resnet18(), dummy_batch,
                        metrics=["size", "speed", "compute", "energy"])
mobilenet_full = benchmark(mobilenet_v3_large(), dummy_batch,
                           metrics=["size", "speed", "compute", "energy"])

# Create radar plot comparing models
fig = create_radar_plot(
    [resnet_full, mobilenet_full],
    model_names=["ResNet-18", "MobileNet-V3"]
)
fig.show()

Summary

Feature	Description
`benchmark()`	Main entry point for comprehensive benchmarking
`BenchmarkResult`	Typed container with IDE autocomplete
Dict access	Backward-compatible `result["key"]` access
`summary()`	Human-readable formatted output
`to_dataframe()`	Export to pandas DataFrame
`to_json()`	Serialize to JSON string
`create_radar_plot()`	Visual model comparison