Sparsifier

Make your neural network sparse with fastai

A sparse vector, as opposed to a dense one, is a vector which contains a lot of zeroes. When we speak about making a neural network sparse, we thus mean that the network’s weight are mostly zeroes.

With fasterai, you can do that thanks to the Sparsifier class.

Let’s start by creating a model

model = resnet18()

As you probably know, weights in a convolutional neural network have 4 dimensions ($ c_{out} c_{in} k_h k_w$)

model.conv1.weight.ndim

In the case of ResNet18, the dimension of the first layer weights is $64 \times 3 \times 7 \times 7$. We thus can plot each of the $64$ filter as a $7 \times 7$ color image (because they contains $3$ channels).

plot_kernels(model.conv1)

The Sparsifier class allows us to remove some (part of) the filters, that are considered to be less useful than others. This can be done by first creating an instance of the class, specifying:

The granularity, i.e. the part of filters that you want to remove. Typically, we usually remove weights, vectors, kernels or even complete filters.
The context, i.e. if you want to consider each layer independently (local), or compare the parameters to remove across the whole network (global).
The criteria, i.e. the way to assess the usefulness of a parameter. Common methods compare parameters using their magnitude, the lowest magnitude ones considered to be less useful.

User can pass a single layer to prune by using the Sparsifier.sparsify_layer method.

source

Sparsifier.sparsify_layer

 Sparsifier.sparsify_layer (m:torch.nn.modules.module.Module,
                            sparsity:float, round_to:Optional[int]=None)

Apply sparsification to a single layer

	Type	Default	Details
m	Module		The layer to sparsify
sparsity	float		Target sparsity level (percentage)
round_to	Optional	None	Round to a multiple of this value
Returns	None

model = resnet18()
sparsifier = Sparsifier(model, 'filter', 'local', large_final)
sparsifier.sparsify_layer(model.conv1, 70)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      6,615         70.31%
Layer 7              Conv2d          36,864     0              0.00%
Layer 10             Conv2d          36,864     0              0.00%
Layer 13             Conv2d          36,864     0              0.00%
Layer 16             Conv2d          36,864     0              0.00%
Layer 20             Conv2d          73,728     0              0.00%
Layer 23             Conv2d          147,456    0              0.00%
Layer 26             Conv2d          8,192      0              0.00%
Layer 29             Conv2d          147,456    0              0.00%
Layer 32             Conv2d          147,456    0              0.00%
Layer 36             Conv2d          294,912    0              0.00%
Layer 39             Conv2d          589,824    0              0.00%
Layer 42             Conv2d          32,768     0              0.00%
Layer 45             Conv2d          589,824    0              0.00%
Layer 48             Conv2d          589,824    0              0.00%
Layer 52             Conv2d          1,179,648  0              0.00%
Layer 55             Conv2d          2,359,296  0              0.00%
Layer 58             Conv2d          131,072    0              0.00%
Layer 61             Conv2d          2,359,296  0              0.00%
Layer 64             Conv2d          2,359,296  0              0.00%
--------------------------------------------------------------------------------
Overall              all             11,166,912 6,615          0.06%

Most of the time, we may want to prune the whole model at once, using the Sparsifier.sparsify_model method, indicating the percentage of sparsity to you want to apply.

source

Sparsifier.sparsify_model

 Sparsifier.sparsify_model (sparsity:Union[float,List[float]],
                            round_to:Optional[int]=None)

Apply sparsification to all matching layers in the model

	Type	Default	Details
sparsity	Union		Target sparsity level(s)
round_to	Optional	None	Round to a multiple of this value
Returns	None

There are several ways in which we can make that first layer sparse. You will find the most important below:

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'local', large_final)
sparsifier.sparsify_model(70)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      6,585         69.99%
Layer 7              Conv2d          36,864     25,805        70.00%
Layer 10             Conv2d          36,864     25,805        70.00%
Layer 13             Conv2d          36,864     25,805        70.00%
Layer 16             Conv2d          36,864     25,805        70.00%
Layer 20             Conv2d          73,728     51,609        70.00%
Layer 23             Conv2d          147,456    103,219       70.00%
Layer 26             Conv2d          8,192      5,734         70.00%
Layer 29             Conv2d          147,456    103,219       70.00%
Layer 32             Conv2d          147,456    103,219       70.00%
Layer 36             Conv2d          294,912    206,438       70.00%
Layer 39             Conv2d          589,824    412,877       70.00%
Layer 42             Conv2d          32,768     22,937        70.00%
Layer 45             Conv2d          589,824    412,877       70.00%
Layer 48             Conv2d          589,824    412,877       70.00%
Layer 52             Conv2d          1,179,648  825,753       70.00%
Layer 55             Conv2d          2,359,296  1,651,507     70.00%
Layer 58             Conv2d          131,072    91,750        70.00%
Layer 61             Conv2d          2,359,296  1,651,506     70.00%
Layer 64             Conv2d          2,359,296  1,651,507     70.00%
--------------------------------------------------------------------------------
Overall              all             11,166,912 7,816,834     70.00%

You now have a model that is $70\%$ sparse !

Granularity

As we said earlier, the granularity defines the structure of parameter that you will remove.

In the example below, we removed weight from each convolutional filter, meaning that we now have sparse filters, as can be seen in the image below:

plot_kernels(model.conv1)

Another granularity is, for example, removing column vectors from the filters. To do so, just change the granularity parameter accordingly.

model = resnet18()
sparsifier = Sparsifier(model, 'column', 'local', large_final)
sparsifier.sparsify_layer(model.conv1, 70)

plot_kernels(model.conv1)

For more information and examples about the pruning granularities, I suggest you to take a look at the corresponding section.

Context

The context defines where to look in the model, i.e. from where do we compare weight. The two basic contexts are: * local, i.e. we compare weight from each layer individually. This will lead to layers with similar levels of sparsity. * global, i.e. we compare weight from the whole model. This will lead to layers with different levels of sparsity

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'local', large_final)
sparsifier.sparsify_model(70)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      6,585         69.99%
Layer 7              Conv2d          36,864     25,805        70.00%
Layer 10             Conv2d          36,864     25,805        70.00%
Layer 13             Conv2d          36,864     25,805        70.00%
Layer 16             Conv2d          36,864     25,805        70.00%
Layer 20             Conv2d          73,728     51,609        70.00%
Layer 23             Conv2d          147,456    103,219       70.00%
Layer 26             Conv2d          8,192      5,734         70.00%
Layer 29             Conv2d          147,456    103,219       70.00%
Layer 32             Conv2d          147,456    103,219       70.00%
Layer 36             Conv2d          294,912    206,438       70.00%
Layer 39             Conv2d          589,824    412,876       70.00%
Layer 42             Conv2d          32,768     22,937        70.00%
Layer 45             Conv2d          589,824    412,877       70.00%
Layer 48             Conv2d          589,824    412,877       70.00%
Layer 52             Conv2d          1,179,648  825,753       70.00%
Layer 55             Conv2d          2,359,296  1,651,507     70.00%
Layer 58             Conv2d          131,072    91,750        70.00%
Layer 61             Conv2d          2,359,296  1,651,507     70.00%
Layer 64             Conv2d          2,359,296  1,651,507     70.00%
--------------------------------------------------------------------------------
Overall              all             11,166,912 7,816,834     70.00%

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'global', large_final)
sparsifier.sparsify_model(70)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      6,340         67.39%
Layer 7              Conv2d          36,864     11,874        32.21%
Layer 10             Conv2d          36,864     11,788        31.98%
Layer 13             Conv2d          36,864     11,761        31.90%
Layer 16             Conv2d          36,864     11,882        32.23%
Layer 20             Conv2d          73,728     32,673        44.32%
Layer 23             Conv2d          147,456    65,162        44.19%
Layer 26             Conv2d          8,192      1,239         15.12%
Layer 29             Conv2d          147,456    64,940        44.04%
Layer 32             Conv2d          147,456    65,162        44.19%
Layer 36             Conv2d          294,912    174,317       59.11%
Layer 39             Conv2d          589,824    350,277       59.39%
Layer 42             Conv2d          32,768     7,123         21.74%
Layer 45             Conv2d          589,824    349,232       59.21%
Layer 48             Conv2d          589,824    349,235       59.21%
Layer 52             Conv2d          1,179,648  894,622       75.84%
Layer 55             Conv2d          2,359,296  1,789,781     75.86%
Layer 58             Conv2d          131,072    39,957        30.48%
Layer 61             Conv2d          2,359,296  1,789,913     75.87%
Layer 64             Conv2d          2,359,296  1,789,559     75.85%
--------------------------------------------------------------------------------
Overall              all             11,166,912 7,816,837     70.00%

Criteria

The criteria defines how we select the parameters to remove. It is usually given by a scoring method. The most common one is the large_final, i.e. select parameters with the highest absolute value as they are supposed to contribute the most to the final results of the model.

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'global', large_final)
sparsifier.sparsify_model(70)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      6,340         67.39%
Layer 7              Conv2d          36,864     11,870        32.20%
Layer 10             Conv2d          36,864     11,904        32.29%
Layer 13             Conv2d          36,864     11,784        31.97%
Layer 16             Conv2d          36,864     11,769        31.93%
Layer 20             Conv2d          73,728     32,861        44.57%
Layer 23             Conv2d          147,456    64,797        43.94%
Layer 26             Conv2d          8,192      1,265         15.44%
Layer 29             Conv2d          147,456    65,236        44.24%
Layer 32             Conv2d          147,456    64,946        44.04%
Layer 36             Conv2d          294,912    174,580       59.20%
Layer 39             Conv2d          589,824    349,429       59.24%
Layer 42             Conv2d          32,768     6,896         21.04%
Layer 45             Conv2d          589,824    349,440       59.24%
Layer 48             Conv2d          589,824    350,151       59.37%
Layer 52             Conv2d          1,179,648  894,264       75.81%
Layer 55             Conv2d          2,359,296  1,790,074     75.87%
Layer 58             Conv2d          131,072    40,049        30.55%
Layer 61             Conv2d          2,359,296  1,789,386     75.84%
Layer 64             Conv2d          2,359,296  1,789,797     75.86%
--------------------------------------------------------------------------------
Overall              all             11,166,912 7,816,838     70.00%

model = resnet18()
sparsifier = Sparsifier(model, 'weight', 'global', small_final)
sparsifier.sparsify_model(70)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      9,407         99.99%
Layer 7              Conv2d          36,864     702            1.90%
Layer 10             Conv2d          36,864     1,177          3.19%
Layer 13             Conv2d          36,864     281            0.76%
Layer 16             Conv2d          36,864     142            0.39%
Layer 20             Conv2d          73,728     4,066          5.51%
Layer 23             Conv2d          147,456    3,290          2.23%
Layer 26             Conv2d          8,192      9              0.11%
Layer 29             Conv2d          147,456    8,264          5.60%
Layer 32             Conv2d          147,456    1,489          1.01%
Layer 36             Conv2d          294,912    46,798        15.87%
Layer 39             Conv2d          589,824    85,884        14.56%
Layer 42             Conv2d          32,768     94             0.29%
Layer 45             Conv2d          589,824    101,028       17.13%
Layer 48             Conv2d          589,824    107,154       18.17%
Layer 52             Conv2d          1,179,648  1,059,786     89.84%
Layer 55             Conv2d          2,359,296  2,171,266     92.03%
Layer 58             Conv2d          131,072    99             0.08%
Layer 61             Conv2d          2,359,296  2,084,975     88.37%
Layer 64             Conv2d          2,359,296  2,130,925     90.32%
--------------------------------------------------------------------------------
Overall              all             11,166,912 7,816,836     70.00%

For more information and examples about the pruning criteria, I suggest you to take a look at the corresponding section.

Remark

In some case, you may want to impose the remaining amount of parameters to be a multiple of 8, this can be done by passing the round_to parameter.

model = resnet18()
sparsifier = Sparsifier(model, 'filter', 'local', large_final)
sparsifier.sparsify_model(70, round_to=8)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      5,880         62.50%
Layer 7              Conv2d          36,864     23,040        62.50%
Layer 10             Conv2d          36,864     23,040        62.50%
Layer 13             Conv2d          36,864     23,040        62.50%
Layer 16             Conv2d          36,864     23,040        62.50%
Layer 20             Conv2d          73,728     50,688        68.75%
Layer 23             Conv2d          147,456    101,376       68.75%
Layer 26             Conv2d          8,192      5,632         68.75%
Layer 29             Conv2d          147,456    101,376       68.75%
Layer 32             Conv2d          147,456    101,376       68.75%
Layer 36             Conv2d          294,912    202,752       68.75%
Layer 39             Conv2d          589,824    405,504       68.75%
Layer 42             Conv2d          32,768     22,528        68.75%
Layer 45             Conv2d          589,824    405,504       68.75%
Layer 48             Conv2d          589,824    405,504       68.75%
Layer 52             Conv2d          1,179,648  811,008       68.75%
Layer 55             Conv2d          2,359,296  1,622,016     68.75%
Layer 58             Conv2d          131,072    90,112        68.75%
Layer 61             Conv2d          2,359,296  1,622,016     68.75%
Layer 64             Conv2d          2,359,296  1,622,017     68.75%
--------------------------------------------------------------------------------
Overall              all             11,166,912 7,667,449     68.66%

model = resnet18()
sparsifier = Sparsifier(model, 'filter', 'global', large_final)
sparsifier.sparsify_model(70, round_to=8)

sparsifier.print_sparsity()


Sparsity Report:
--------------------------------------------------------------------------------
Layer                Type            Params     Zeros      Sparsity  
--------------------------------------------------------------------------------
Layer 1              Conv2d          9,408      8,232         87.50%
Layer 7              Conv2d          36,864     0              0.00%
Layer 10             Conv2d          36,864     0              0.00%
Layer 13             Conv2d          36,864     0              0.00%
Layer 16             Conv2d          36,864     0              0.00%
Layer 20             Conv2d          73,728     69,120        93.75%
Layer 23             Conv2d          147,456    138,240       93.75%
Layer 26             Conv2d          8,192      0              0.00%
Layer 29             Conv2d          147,456    138,240       93.75%
Layer 32             Conv2d          147,456    129,024       87.50%
Layer 36             Conv2d          294,912    285,696       96.88%
Layer 39             Conv2d          589,824    571,392       96.88%
Layer 42             Conv2d          32,768     0              0.00%
Layer 45             Conv2d          589,824    571,392       96.88%
Layer 48             Conv2d          589,824    571,392       96.88%
Layer 52             Conv2d          1,179,648  1,161,216     98.44%
Layer 55             Conv2d          2,359,296  2,322,432     98.44%
Layer 58             Conv2d          131,072    0              0.00%
Layer 61             Conv2d          2,359,296  2,322,432     98.44%
Layer 64             Conv2d          2,359,296  2,322,432     98.44%
--------------------------------------------------------------------------------
Overall              all             11,166,912 10,611,240    95.02%

For more information about granularities at which you can operate, please check the related page.