Schedules

When should I prune my network ?

Found permutation search CUDA kernels [ASP][Info] permutation_search_kernels can be imported. —

source

Schedule


def Schedule(
    sched_func:Callable, # Function that computes progress at given training percentage
    start_pct:float=0.0, # Percentage of training to start schedule
    end_pct:float=1.0, # Percentage of training to end schedule
    start_val:float=0.0, # Starting value for progress range
    end_val:float=1.0, # Ending value for progress range
):

Base class to create schedules that return progress (0→1)

The Schedule class returns progress values from 0→1 by default, enabling the same schedule to work for sparsification, pruning, regularization, and distillation weight.

Key Method: schedule.progress(pct_train) returns how far along the schedule has progressed.

Parameters: - sched_func: the function that determines the progression curve (e.g., linear, cosine, AGP) - start_pct: training percentage at which the schedule begins (default: 0.0) - end_pct: training percentage at which the schedule completes (default: 1.0) - start_val: starting value of the progress range (default: 0.0) - end_val: ending value of the progress range (default: 1.0)

Usage:

# Get current value by multiplying target by progress
progress = schedule.progress(pct_train)
current_sparsity = target_sparsity * progress
current_weight = target_weight * progress

The start_val/end_val parameters enable schedule composition: chain multiple schedules where each picks up from where the previous one left off. See the Composing Schedules section below.

One-Shot

The easiest schedule is the one-shot pruning, i.e. prune the network once. This can be done by simply returning the desired sparsity value. The moment when you want to prune will be controlled by the start_epoch argument in the SparsifyCallback.


source

sched_oneshot


def sched_oneshot(
    start:float, # Starting sparsity level
    end:float, # Target sparsity level
    pos:float, # Current position in schedule (0-1)
)->float:

One-shot pruning: jump directly to target sparsity

one_shot.plot(50)


Iterative

Instead of pruning the network to desired sparsity in one step, you can do it iteratively. In fasterai, you can change the amount of iterations


source

sched_iterative


def sched_iterative(
    start:float, # Starting sparsity level
    end:float, # Target sparsity level
    pos:float, # Current position in schedule (0-1)
    n_steps:int=3, # Number of pruning steps
)->float:

Perform iterative pruning in discrete steps

iterative.plot(50)

To modify the default n_steps, you can use the partial function.

iterative = Schedule(partial(sched_iterative, n_steps=5), start_pct=0.2)
iterative.plot(50)


Automated Gradual Pruning

Some researchers have come up with more sophisticated schedules, such as the Automated Gradual Pruning.


source

sched_agp


def sched_agp(
    start:float, # Starting sparsity level
    end:float, # Target sparsity level
    pos:float, # Current position in schedule (0-1)
)->float:

Automated gradual pruning schedule with cubic decay

agp.plot(50)

One-Cycle Pruning


source

sched_onecycle


def sched_onecycle(
    start:float, # Starting sparsity level
    end:float, # Target sparsity level
    pos:float, # Current position in schedule (0-1)
    α:float=14, # Steepness parameter
    β:float=6, # Offset parameter
)->float:

One-cycle schedule based on logistic function

one_cycle.plot(50)

On top of that, all of the schedules available in fastai by default are also available: - sched_cos - sched_linear

cos.plot(50)

lin.plot(50)


Dense-Sparse-Dense

You can also create even more interesting behaviours such as the DSD method, where you prune the model in the first place, then re-grow it to its initial amount of parameter.


source

sched_dsd


def sched_dsd(
    start:float, # Starting sparsity level
    end:float, # Target sparsity level
    pos:float, # Current position in schedule (0-1)
)->float:

Dense-Sparse-Dense schedule: increase then decrease sparsity

dsd.plot(50)


Composing Schedules

By default, progress() returns values in [0, 1]. But with start_val and end_val, you can control the output range of each schedule, making it easy to chain them together for multi-phase training.

For example, say you want to:

  1. Phase 1 (0%–40% of training): ramp sparsity from 0% to 30% using AGP
  2. Phase 2 (40%–70% of training): ramp sparsity from 30% to 50% using cosine
  3. Phase 3 (70%–100% of training): hold at 50%

Each schedule maps its [start_val, end_val] to a portion of the overall progress. The callback still just computes target * progress — the composition is entirely in the schedule definitions.

composed = [
    Schedule(sched_agp, start_pct=0.0, end_pct=0.4, start_val=0.0, end_val=0.6),  # 0→60% of target
    Schedule(sched_cos, start_pct=0.4, end_pct=0.7, start_val=0.6, end_val=1.0),  # 60→100% of target
    # Phase 3: no schedule needed — last schedule holds at end_val after end_pct
]

The first schedule (AGP) ramps progress from 0.0 to 0.6, so target * progress goes from 0% to 30%. The second schedule (cosine) picks up at 0.6 and continues to 1.0, taking sparsity from 30% to 50%. After the last schedule’s end_pct, the progress holds at end_val — giving us the hold phase for free.


See Also

  • SparsifyCallback - Apply sparsification during training using these schedules
  • PruneCallback - Apply structured pruning during training
  • Criteria - Different importance measures for selecting what to prune