Much of the success of deep learning is due to the development of ever-larger neural networks. This allows deep learning models to perform better, but they are also more expensive to use. After all, larger models take up more storage space, take more training time and often require more expensive hardware.
This poses a challenge for many organizations as soon as they want to take an application into production. To solve these challenges, organizations need to reduce the size of models through a model compression method. However, model compression also often results in a loss of performance, so there is a high chance that the deep learning model operates less accurately.
The algorithm, called FlipOut, consists of a combination of the Neural Network Pruning and quantization compression methods. Neural Network Pruning removes the redundant weights of a trained model and quantization reduces the number of bits, resulting in fewer calculations and a smaller resulting model. When applied only through Neural network Pruning, the algorithm is able to remove 90% of the connections in the networks without sacrificing accuracy or performance. In quantization, the algorithm is able to reduce the amount of bits from 32 to 8 bits per connection.
During their research, the researchers also discovered that the two methods are complementary to each other and work extremely well together. When the two compression methods are combined, the algorithm can remove 75% of the connections, while storing values with four times fewer bits, with minimal degradation in accuracy.
According to the researchers, the FlipOut algorithm is an important step when it comes to reducing costs and saving energy. Thanks to the algorithm, organizations will soon have to use less storage space, allowing them to save significantly in energy and costs, without sacrificing performance and accuracy.