Roofline
Roofline analysis for arithmetic intensity vs achieved performance
Usage
from fasterbench.roofline import RooflineAnalyzer
ra = RooflineAnalyzer(model, sample)
ra.profile(device="cuda")
ra.summary()
fig = ra.plot()
fig.show()This is a measurement primitive. Downstream compression workflows (see fasterrecipes) can consume ra.results to make decisions - fasterbench itself never prescribes.
See Also
- Per-layer profiling - Generic per-layer hook infrastructure reused here
- Compute metrics - Model-level FLOPs counting
- Speed metrics - Latency measurement