Speed

Latency and throughput measurement for PyTorch models

source

sweep_latency


def sweep_latency(
    model:nn.Module, # model to benchmark
    shapes:Sequence[Sequence[int]], # input shapes to test, e.g. [(1,3,224,224), (1,3,384,384)]
    device:str | torch.device='cuda', # device to run on
    warmup:int=20, # warmup iterations per shape
    steps:int=100, # measurement iterations per shape
)->list[dict]:

Sweep input shapes to analyze latency vs resolution.


source

sweep_batch_sizes


def sweep_batch_sizes(
    model:nn.Module, # model to benchmark
    input_shape:Sequence[int], # input shape WITHOUT batch dim, e.g. (3, 224, 224)
    batch_sizes:Sequence[int]=(1, 2, 4, 8, 16, 32), # batch sizes to test
    device:str | torch.device='cuda', # device to run on
    warmup:int=20, # warmup iterations per batch size
    steps:int=100, # measurement iterations per batch size
)->list[dict]:

Sweep batch sizes to find optimal throughput.


source

sweep_threads


def sweep_threads(
    model:nn.Module, # model to benchmark
    sample:torch.Tensor, # input tensor (with batch dimension)
    thread_counts:Sequence[int]=(1, 2, 4, 8), # thread counts to test
    warmup:int=20, # warmup iterations per thread count
    steps:int=100, # measurement iterations per thread count
)->list[dict]:

Sweep CPU thread counts to find optimal parallelism.


source

compute_speed_multi


def compute_speed_multi(
    model:nn.Module, # model to benchmark
    sample:torch.Tensor, # input tensor (with batch dimension)
    devices:Sequence[str | torch.device] | None=None, # devices to benchmark (default: cpu + cuda if available)
    kwargs:VAR_KEYWORD
)->dict[str, SpeedMetrics]:

Measure latency/throughput on multiple devices.


source

compute_speed


def compute_speed(
    model:nn.Module, # model to benchmark
    sample:torch.Tensor, # input tensor (with batch dimension)
    device:str | torch.device='cpu', # device to run on
    warmup:int=20, # warmup iterations
    steps:int=100, # measurement iterations
)->SpeedMetrics:

Measure latency and throughput on a single device.


source

SpeedMetrics


def SpeedMetrics(
    p50_ms:float, p90_ms:float, p99_ms:float, mean_ms:float, std_ms:float, throughput_s:float
)->None:

Latency and throughput metrics for a single device.


See Also