speed
Speed modules for benchmarking
sweep_latency
sweep_latency (model:torch.nn.modules.module.Module, shapes:Sequence[Sequence[int]], device:str|torch.device='cuda', warmup:int=20, steps:int=100)
Sweep different input shapes* on the same device. Useful for CNNs/ViTs where latency scales with resolution.*
sweep_threads
sweep_threads (model:torch.nn.modules.module.Module, sample:torch.Tensor, thread_counts:Sequence[int]=(1, 2, 4, 8), warmup:int=20, steps:int=100)
Return a pandas-compatible* list-of-dicts; each row contains latency stats for a different torch.set_num_threads(n)
.*
compute_speed_multi
compute_speed_multi (model:torch.nn.modules.module.Module, sample:torch.Tensor, devices:Optional[Sequence[str|torch.device]]=None, **kwargs)
Convenience wrapper: returns dict[str, SpeedMetrics]
keyed by device. Defaults to both CPU and* CUDA if a GPU is available.*
compute_speed
compute_speed (model:torch.nn.modules.module.Module, sample:torch.Tensor, device:str|torch.device='cpu', warmup:int=20, steps:int=100)
Measure latency/throughput on one* device. Returns a SpeedMetrics
.*
SpeedMetrics
SpeedMetrics (p50_ms:float, p90_ms:float, p99_ms:float, mean_ms:float, std_ms:float, throughput_s:float)