Sparsifier

Make your neural network sparse with fastai

A sparse vector, as opposed to a dense one, is a vector which contains a lot of zeroes. When we speak about making a neural network sparse, we thus mean that the network’s weight are mostly zeroes.

With fasterai, you can do that thanks to the Sparsifier class.

Let’s start by creating a model

model = resnet18()

As you probably know, weights in a convolutional neural network have 4 dimensions ($ c_{out} c_{in} k_h k_w$)

model.conv1.weight.ndim
4

In the case of ResNet18, the dimension of the first layer weights is \(64 \times 3 \times 7 \times 7\). We thus can plot each of the \(64\) filter as a \(7 \times 7\) color image (because they contains \(3\) channels).

plot_kernels(model.conv1)

The Sparsifier class allows us to remove some (part of) the filters, that are considered to be less useful than others. This can be done by first creating an instance of the class, specifying:

User can pass a single layer to prune by using the Sparsifier.sparsify_layer method.

Found permutation search CUDA kernels [ASP][Info] permutation_search_kernels can be imported. —

source

Sparsifier.sparsify_layer


def sparsify_layer(
    m:nn.Module, # The layer to sparsify
    sparsity:float, # Target sparsity level (percentage)
    round_to:Optional[int]=None, # Round to a multiple of this value
)->None:

Apply sparsification to a single layer

model = resnet18()
sparsifier = Sparsifier(model, 'filter', 'local', large_final)
sparsifier.sparsify_layer(model.conv1, 70)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      6,615         70.31%
layer1.0.conv1                 Conv2d          36,864     0              0.00%
layer1.0.conv2                 Conv2d          36,864     0              0.00%
layer1.1.conv1                 Conv2d          36,864     0              0.00%
layer1.1.conv2                 Conv2d          36,864     0              0.00%
layer2.0.conv1                 Conv2d          73,728     0              0.00%
layer2.0.conv2                 Conv2d          147,456    0              0.00%
layer2.0.downsample.0          Conv2d          8,192      0              0.00%
layer2.1.conv1                 Conv2d          147,456    0              0.00%
layer2.1.conv2                 Conv2d          147,456    0              0.00%
layer3.0.conv1                 Conv2d          294,912    0              0.00%
layer3.0.conv2                 Conv2d          589,824    0              0.00%
layer3.0.downsample.0          Conv2d          32,768     0              0.00%
layer3.1.conv1                 Conv2d          589,824    0              0.00%
layer3.1.conv2                 Conv2d          589,824    0              0.00%
layer4.0.conv1                 Conv2d          1,179,648  0              0.00%
layer4.0.conv2                 Conv2d          2,359,296  0              0.00%
layer4.0.downsample.0          Conv2d          131,072    0              0.00%
layer4.1.conv1                 Conv2d          2,359,296  1              0.00%
layer4.1.conv2                 Conv2d          2,359,296  0              0.00%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 6,616          0.06%

Most of the time, we may want to prune the whole model at once, using the Sparsifier.sparsify_model method, indicating the percentage of sparsity to you want to apply.


source

Sparsifier.sparsify_model


def sparsify_model(
    sparsity:Union[float, dict], # Target sparsity level or per-layer dict
    round_to:Optional[int]=None, # Round to a multiple of this value
)->None:

Apply sparsification to all matching layers in the model

There are several ways in which we can make that first layer sparse. You will find the most important below:

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'local', large_final)
sparsifier.sparsify_model(70)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      6,585         69.99%
layer1.0.conv1                 Conv2d          36,864     25,805        70.00%
layer1.0.conv2                 Conv2d          36,864     25,805        70.00%
layer1.1.conv1                 Conv2d          36,864     25,805        70.00%
layer1.1.conv2                 Conv2d          36,864     25,805        70.00%
layer2.0.conv1                 Conv2d          73,728     51,609        70.00%
layer2.0.conv2                 Conv2d          147,456    103,219       70.00%
layer2.0.downsample.0          Conv2d          8,192      5,734         70.00%
layer2.1.conv1                 Conv2d          147,456    103,219       70.00%
layer2.1.conv2                 Conv2d          147,456    103,219       70.00%
layer3.0.conv1                 Conv2d          294,912    206,438       70.00%
layer3.0.conv2                 Conv2d          589,824    412,877       70.00%
layer3.0.downsample.0          Conv2d          32,768     22,937        70.00%
layer3.1.conv1                 Conv2d          589,824    412,877       70.00%
layer3.1.conv2                 Conv2d          589,824    412,877       70.00%
layer4.0.conv1                 Conv2d          1,179,648  825,753       70.00%
layer4.0.conv2                 Conv2d          2,359,296  1,651,506     70.00%
layer4.0.downsample.0          Conv2d          131,072    91,750        70.00%
layer4.1.conv1                 Conv2d          2,359,296  1,651,507     70.00%
layer4.1.conv2                 Conv2d          2,359,296  1,651,507     70.00%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 7,816,834     70.00%

You now have a model that is \(70\%\) sparse !

Granularity

As we said earlier, the granularity defines the structure of parameter that you will remove.

In the example below, we removed weight from each convolutional filter, meaning that we now have sparse filters, as can be seen in the image below:

plot_kernels(model.conv1)

Another granularity is, for example, removing column vectors from the filters. To do so, just change the granularity parameter accordingly.

model = resnet18()
sparsifier = Sparsifier(model, 'column', 'local', large_final)
sparsifier.sparsify_layer(model.conv1, 70)
plot_kernels(model.conv1)

For more information and examples about the pruning granularities, I suggest you to take a look at the corresponding section.

Context

The context defines where to look in the model, i.e. from where do we compare weight. The two basic contexts are: * local, i.e. we compare weight from each layer individually. This will lead to layers with similar levels of sparsity. * global, i.e. we compare weight from the whole model. This will lead to layers with different levels of sparsity

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'local', large_final)
sparsifier.sparsify_model(70)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      6,585         69.99%
layer1.0.conv1                 Conv2d          36,864     25,805        70.00%
layer1.0.conv2                 Conv2d          36,864     25,805        70.00%
layer1.1.conv1                 Conv2d          36,864     25,805        70.00%
layer1.1.conv2                 Conv2d          36,864     25,805        70.00%
layer2.0.conv1                 Conv2d          73,728     51,609        70.00%
layer2.0.conv2                 Conv2d          147,456    103,219       70.00%
layer2.0.downsample.0          Conv2d          8,192      5,734         70.00%
layer2.1.conv1                 Conv2d          147,456    103,219       70.00%
layer2.1.conv2                 Conv2d          147,456    103,219       70.00%
layer3.0.conv1                 Conv2d          294,912    206,438       70.00%
layer3.0.conv2                 Conv2d          589,824    412,877       70.00%
layer3.0.downsample.0          Conv2d          32,768     22,937        70.00%
layer3.1.conv1                 Conv2d          589,824    412,877       70.00%
layer3.1.conv2                 Conv2d          589,824    412,877       70.00%
layer4.0.conv1                 Conv2d          1,179,648  825,753       70.00%
layer4.0.conv2                 Conv2d          2,359,296  1,651,506     70.00%
layer4.0.downsample.0          Conv2d          131,072    91,750        70.00%
layer4.1.conv1                 Conv2d          2,359,296  1,651,507     70.00%
layer4.1.conv2                 Conv2d          2,359,296  1,651,507     70.00%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 7,816,834     70.00%
model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'global', large_final)
sparsifier.sparsify_model(70)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      6,214         66.05%
layer1.0.conv1                 Conv2d          36,864     11,786        31.97%
layer1.0.conv2                 Conv2d          36,864     11,864        32.18%
layer1.1.conv1                 Conv2d          36,864     11,806        32.03%
layer1.1.conv2                 Conv2d          36,864     11,831        32.09%
layer2.0.conv1                 Conv2d          73,728     32,757        44.43%
layer2.0.conv2                 Conv2d          147,456    64,894        44.01%
layer2.0.downsample.0          Conv2d          8,192      1,234         15.06%
layer2.1.conv1                 Conv2d          147,456    64,982        44.07%
layer2.1.conv2                 Conv2d          147,456    65,301        44.29%
layer3.0.conv1                 Conv2d          294,912    174,570       59.19%
layer3.0.conv2                 Conv2d          589,824    349,497       59.25%
layer3.0.downsample.0          Conv2d          32,768     7,208         22.00%
layer3.1.conv1                 Conv2d          589,824    349,981       59.34%
layer3.1.conv2                 Conv2d          589,824    349,240       59.21%
layer4.0.conv1                 Conv2d          1,179,648  894,898       75.86%
layer4.0.conv2                 Conv2d          2,359,296  1,788,755     75.82%
layer4.0.downsample.0          Conv2d          131,072    39,958        30.49%
layer4.1.conv1                 Conv2d          2,359,296  1,790,109     75.87%
layer4.1.conv2                 Conv2d          2,359,296  1,789,953     75.87%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 7,816,838     70.00%

Criteria

The criteria defines how we select the parameters to remove. It is usually given by a scoring method. The most common one is the large_final, i.e. select parameters with the highest absolute value as they are supposed to contribute the most to the final results of the model.

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'global', large_final)
sparsifier.sparsify_model(70)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      6,325         67.23%
layer1.0.conv1                 Conv2d          36,864     11,915        32.32%
layer1.0.conv2                 Conv2d          36,864     11,815        32.05%
layer1.1.conv1                 Conv2d          36,864     11,965        32.46%
layer1.1.conv2                 Conv2d          36,864     11,990        32.52%
layer2.0.conv1                 Conv2d          73,728     32,395        43.94%
layer2.0.conv2                 Conv2d          147,456    65,275        44.27%
layer2.0.downsample.0          Conv2d          8,192      1,279         15.61%
layer2.1.conv1                 Conv2d          147,456    64,888        44.00%
layer2.1.conv2                 Conv2d          147,456    65,148        44.18%
layer3.0.conv1                 Conv2d          294,912    174,785       59.27%
layer3.0.conv2                 Conv2d          589,824    349,838       59.31%
layer3.0.downsample.0          Conv2d          32,768     7,069         21.57%
layer3.1.conv1                 Conv2d          589,824    350,378       59.40%
layer3.1.conv2                 Conv2d          589,824    349,638       59.28%
layer4.0.conv1                 Conv2d          1,179,648  894,232       75.80%
layer4.0.conv2                 Conv2d          2,359,296  1,789,714     75.86%
layer4.0.downsample.0          Conv2d          131,072    39,670        30.27%
layer4.1.conv1                 Conv2d          2,359,296  1,789,491     75.85%
layer4.1.conv2                 Conv2d          2,359,296  1,789,027     75.83%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 7,816,837     70.00%
model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'global', small_final)
sparsifier.sparsify_model(70)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      9,407         99.99%
layer1.0.conv1                 Conv2d          36,864     456            1.24%
layer1.0.conv2                 Conv2d          36,864     327            0.89%
layer1.1.conv1                 Conv2d          36,864     435            1.18%
layer1.1.conv2                 Conv2d          36,864     905            2.45%
layer2.0.conv1                 Conv2d          73,728     4,653          6.31%
layer2.0.conv2                 Conv2d          147,456    6,854          4.65%
layer2.0.downsample.0          Conv2d          8,192      8              0.10%
layer2.1.conv1                 Conv2d          147,456    6,538          4.43%
layer2.1.conv2                 Conv2d          147,456    9,241          6.27%
layer3.0.conv1                 Conv2d          294,912    83,006        28.15%
layer3.0.conv2                 Conv2d          589,824    22,507         3.82%
layer3.0.downsample.0          Conv2d          32,768     11             0.03%
layer3.1.conv1                 Conv2d          589,824    47,880         8.12%
layer3.1.conv2                 Conv2d          589,824    105,624       17.91%
layer4.0.conv1                 Conv2d          1,179,648  1,094,504     92.78%
layer4.0.conv2                 Conv2d          2,359,296  2,143,119     90.84%
layer4.0.downsample.0          Conv2d          131,072    378            0.29%
layer4.1.conv1                 Conv2d          2,359,296  1,921,688     81.45%
layer4.1.conv2                 Conv2d          2,359,296  2,359,296    100.00%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 7,816,837     70.00%

For more information and examples about the pruning criteria, I suggest you to take a look at the corresponding section.

Remark

In some case, you may want to impose the remaining amount of parameters to be a multiple of 8, this can be done by passing the round_to parameter.

model = resnet18()
sparsifier = Sparsifier(model, 'filter', 'local', large_final)
sparsifier.sparsify_model(70, round_to=8)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      5,880         62.50%
layer1.0.conv1                 Conv2d          36,864     23,040        62.50%
layer1.0.conv2                 Conv2d          36,864     23,040        62.50%
layer1.1.conv1                 Conv2d          36,864     23,040        62.50%
layer1.1.conv2                 Conv2d          36,864     23,040        62.50%
layer2.0.conv1                 Conv2d          73,728     50,688        68.75%
layer2.0.conv2                 Conv2d          147,456    101,376       68.75%
layer2.0.downsample.0          Conv2d          8,192      5,632         68.75%
layer2.1.conv1                 Conv2d          147,456    101,376       68.75%
layer2.1.conv2                 Conv2d          147,456    101,376       68.75%
layer3.0.conv1                 Conv2d          294,912    202,754       68.75%
layer3.0.conv2                 Conv2d          589,824    405,504       68.75%
layer3.0.downsample.0          Conv2d          32,768     22,528        68.75%
layer3.1.conv1                 Conv2d          589,824    405,504       68.75%
layer3.1.conv2                 Conv2d          589,824    405,504       68.75%
layer4.0.conv1                 Conv2d          1,179,648  811,008       68.75%
layer4.0.conv2                 Conv2d          2,359,296  1,622,016     68.75%
layer4.0.downsample.0          Conv2d          131,072    90,112        68.75%
layer4.1.conv1                 Conv2d          2,359,296  1,622,016     68.75%
layer4.1.conv2                 Conv2d          2,359,296  1,622,016     68.75%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 7,667,450     68.66%
model = resnet18()
sparsifier = Sparsifier(model, 'filter', 'global', large_final)
sparsifier.sparsify_model(70, round_to=8)
sparsifier.print_sparsity()

Sparsity Report:
--------------------------------------------------------------------------------
Layer                          Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
conv1                          Conv2d          9,408      8,232         87.50%
layer1.0.conv1                 Conv2d          36,864     0              0.00%
layer1.0.conv2                 Conv2d          36,864     0              0.00%
layer1.1.conv1                 Conv2d          36,864     0              0.00%
layer1.1.conv2                 Conv2d          36,864     0              0.00%
layer2.0.conv1                 Conv2d          73,728     69,120        93.75%
layer2.0.conv2                 Conv2d          147,456    138,240       93.75%
layer2.0.downsample.0          Conv2d          8,192      0              0.00%
layer2.1.conv1                 Conv2d          147,456    138,240       93.75%
layer2.1.conv2                 Conv2d          147,456    138,240       93.75%
layer3.0.conv1                 Conv2d          294,912    285,696       96.88%
layer3.0.conv2                 Conv2d          589,824    571,392       96.88%
layer3.0.downsample.0          Conv2d          32,768     0              0.00%
layer3.1.conv1                 Conv2d          589,824    571,392       96.88%
layer3.1.conv2                 Conv2d          589,824    571,392       96.88%
layer4.0.conv1                 Conv2d          1,179,648  1,161,216     98.44%
layer4.0.conv2                 Conv2d          2,359,296  2,322,432     98.44%
layer4.0.downsample.0          Conv2d          131,072    0              0.00%
layer4.1.conv1                 Conv2d          2,359,296  2,322,432     98.44%
layer4.1.conv2                 Conv2d          2,359,296  2,285,568     96.88%
--------------------------------------------------------------------------------
Overall                        all             11,166,912 10,583,592    94.78%

For more information about granularities at which you can operate, please check the related page.


Summary

Tool Purpose
Sparsifier Core class for zeroing out weights
sparsify_model() Apply sparsification to all matching layers
sparsify_layer() Apply sparsification to a single layer
print_sparsity() Report per-layer sparsity statistics
large_final Criteria: keep weights with largest magnitude
Granularity options weight, vector, kernel, filter
Context options local (per-layer) vs global (network-wide)

See Also