Binarized Neural Networks

Binarized Neural Networks

Neural Networks with binarized weights and activations at runtime.

Group Members

Arvind Sridhar: asridh13
Nicholas Masi: nmasi

Introduction

Original Paper
This paper from 2016 introduces the original idea behind BNNs (“neural networks with binary weights and activations at run-time”) to decrease space usage and runtime of models while also increasing the power efficiency. We are re-implementing this paper, applying BNN’s to new datasets, inspired by the assignments from this class, with narrowed goals of improving runtime and memory usage. Our project will tackle a number of different kinds of problems following what we’ve covered in class, with the main goal of improving the runtime and memory usage of models in both training and testing while maintaining relatively high accuracy. We will attempt to use the TensorFlow Profiler to track resource consumption of our models, and use the test() functions we’ve already written to see the accuracy of our models.

Related Work

This link is a blog post about the paper that helped us conceptualize BNNs and how they could be applicable to the course assignments that we’ve implemented thus far.

Data

Data will be an image-based dataset that we have yet to choose, similar yet different to one of the datasets that we have explored in class thus far.

Methodology

Training is the same as it has been, except prior to doing feedforward we constrain all of our values binarily to 1 or -1.
The hardest part of implementing the binarized model will likely be changing the feed-forward implementation to account for binarized weights values, as opposed to the non-binarized values that we have so far used in our class assignments during feed-forward.
We are using the TensorFlow library to implement our BNN, whereas the original paper uses Torch7 and Theano.
Additionally, we will be applying our BNN implementation to a new dataset, while other implementations have only been used on MNIST and CIFAR-10 (of the datasets that are applicable to the assignments of this class).

Metrics

We plan to track the accuracy, runtime, and memory usage of our three models as we created them for our class assignments. We will then do the same for the binarized versions of our models and compare them.
Accuracy still applies, as our new BNNs should still be at least almost as accurate on the problem as the original model. Other important metrics, however, are their runtime and memory usage (in both training and testing), which should increase in our BNNs drastically over our original models.
The authors of the paper were hoping to demonstrate that usage of BNNs drastically reduces memory consumption and leads to a substantial improvement in power efficiency during both training and testing. These goals were numerically metricized by comparing the classification test error rates for binarized models vs. non-binarized models, comparing the energy consumption of typical network calculations vs. the bitwise calculations of BNNs, and comparing GPU execution times of both models.
We’re not doing something new/non-existent in the DL literature.
Base goals: creating BNNs that are functional and able to perform the task they’re designed for without significant drop in accuracy.
Target goals: creating BNNs that perform with only minor drops in accuracy and have demonstrable increases in runtime and memory usage.
Stretch goals: creating BNNs that perform with only minor drops in accuracy and have significant increases in runtime and memory usage.

Ethics

Deep Learning models take a lot of resources (time, memory, power) to run. The two requirements of time and memory mean DL can often only take place on very advanced hardware (or at least cannot take place on limited hardware like microcontrollers) and the heavy power requirements mean the energy usage of DL is often detrimental to the environment.
The major stakeholders are any researcher or user of DL; the success of BNNs (marked by their ability to still achieve high accuracy while drastically improving resource efficiency) has implications on their ability to use DL models faster and in a way that is less harmful to the environment, as well as being able to run models on more limited hardware.

Breakdown of Labor

Nick: writing the binary activation function, working on Introduction and Results section of the poster, working on Challenges and Reflection sections of final writeup
Arvind: adding the binarization of the weights during training, Detail experiment methodology and broader extrapolations on poster, preparing and delivering oral presentation

Built With

numpy
python
tensorflow

Updates

Arvind Sridhar posted an update — Dec 09, 2021 12:11 PM EST

Final Writeup

Project Name: Binarized Neural Networks

Team: Binary Bros

Arvind Sridhar (asridh13)
Nicholas Masi (nmasi)

Introduction

Original Paper
This paper from 2016 introduces the original idea behind BNNs (“neural networks with binary weights and activations at run-time”) to decrease space usage and runtime of models while also increasing the power efficiency. Our group reimplemented this paper, applying BNN’s to Fashion MNIST - a dataset composed of tens of thousands of clothing images that fall into 10 categories. Our project tackled a number of different kinds of problems following what we’ve covered in class, with the main goal of improving the runtime and memory usage of models in both training and testing while maintaining relatively high accuracy. Additionally, we used Larq, a collection of open-source Python packages for building, training, and deploying Binarized Neural Networks, and used TensorFlow features such as lite files to gain actionable metrics that could be used to quantify our results.

Methodology

We replicated the model architecture from Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1, the aforementioned original paper on BNNs, in building our models for classification. Our BNN is a recreation of their binary model using Larq, and the vanilla model has near-identical architecture aside from full-precision (float32) activations and weights, thus making it a control. The models use the ADAM optimizer with an exponentially decaying learning rate. Weights are initialized according to Glorot Uniform. Batches of 100 are used, and Batch Normalization is performed after each layer; Batch Normalization is considered an essential property for BNNs as described by Simons and Lee (2019). Since the derivative of the Sign/Signum function, which is used as the deterministic binary quantizer in our BNN, is zero almost everywhere (ruining backpropagation), we used the Saturated Straight Through Estimator (Saturated STE) to backpropagate through the BNN; these are implemented together as the “ste_sign” quantizer in the Larq library. The authors use L2-Hinge Loss for BNN and claim it outperforms Softmax at certain classification tasks following the work of Tang (2013), but in our experimentation, we found that our Larq-implemented BNN was unable to learn when using it and thus we utilized Softmax activation on the last layers of both our models along with Sparse Categorical Cross-entropy as the loss function. Finally, the paper’s MLP architecture consisted of 3 hidden layers of size 4,096 and 1000 epochs (we trained with 10 for the sake of time). Larq has not yet optimized binary dense layers, and so we implement dense layers in our BNN with 2D convolution layers that have 1x1 kernels, strides of 1, and no padding. This yields the same architecture while using an optimized binary function. We follow all of these discussed architecture cues in our BNNs and implement them in our vanilla model as appropriate (e.g., no quantization and STE in the vanilla model). With this, we ensure that variations in accuracy and runtime/memory between the vanilla and binarized models are as attributable to the binarization as possible and recreate conditions as close to the original paper as was possible given the resources available to us. We train and test our models using the fashion_mnist dataset, which is a harder version of the MNIST benchmark. All our code is run in a Colab notebook, including data preprocessing, model building, training, testing, and analyzing. We use the TFLite Analyzer to see the size of the models when serialized as TFLite files meant for small hardware. We ran the models on the test data of the dataset after training for 10 epochs to assess accuracy.

Results

Our results were mostly successful in replicating those of the original paper. We conducted multiple rounds of training and testing to judge the accuracy and found that our accuracy difference between the BNN and vanilla model was negligible. This was both surprising and impressive, as typically one would expect that BNN accuracy would be slightly lower than vanilla accuracy due to the loss of technical nuance in binarizing weights at run time, which did not end up being the case for our BNN. Though we were unable to track power efficiency due to hardware limitations (detailed in the Challenges section), we were able to record an 88% model file size decrease from the vanilla model to the BNN. This metric of efficiency, especially when combined with negligible accuracy differences, goes to show the potential of BNNs in optimized deep learning practice. Finally, we were not able to accurately gauge runtime, once again due to hardware limitations. Our BNN took longer to train on Colab than the vanilla model, but that is because the Larq library is not at all optimized for the Colab interface, and if we were to run the models on 64-bit ARM architecture as they are designed for, we would expect a significant decrease in runtime for the Binarized model as compared to the vanilla model.

Challenges

One of the initial problems that we ran into was converting previous class assignments into code that was optimized for binarization in the form that we wanted it to take. For example, we initially took the MNIST assignment, for which we later changed the dataset, and retrofitted our implementation to use Keras. In terms of additional difficulties, our project used Google CoLab, since the TensorFlow profiler which we were originally planning to use to collect statistics needed machine specifications that were unavailable to the members of the team. However, Larq is only optimized for 64-bit ARM architecture and Android devices, so it was not able to show runtime/memory improvements on CoLab, which uses GPUs. We eventually were able to use the TFLite Analyzer to evaluate memory usage, but couldn't analyze optimized runtime (where we expect the BNN would show significant improvements over the vanilla model) because we didn't have any 64-bit ARM computers accessible to us.

Reflection

Our project turned out very successful, not just in the results that we garnered, but more importantly in the conceptual understanding of binarization and the power that it could have as a future industry and research practice for deep learning. At the beginning of our project, we established our stretch goal as creating BNNs that perform with only minor drops in accuracy and have significantly better performance in runtime and memory usage. With the exception of runtime due to hardware limitations, we were able excitingly to meet this goal, with even better accuracy than we expected.
Our model definitely had significant tweaks in how we expected to implement it at the beginning of the project process. For example, we planned on using the tensor flow profiler extensively to document the statistics that we wanted to compare between the vanilla model and the BNN model. However, we ran into significant hardware difficulties and had to adapt our model accordingly. Our approach changed significantly over time. We changed the interface on which we decided to create our implementation, and after identifying our inability to track some of the main statistics we were hoping to see due to the platform we were on, we came up with the idea of using TFLite files to record additional statistics for our model.
If we could do our project over again, and more generally for future projects, we would absolutely want to make sure that our plans for implementation took hardware into account, and more specifically hardware limitations. If we were to have more time to build on what we’ve done so far, we would implement binarization on other types of DL models that have been explored in the class, such as RNNs and transformers. Additionally, there is a lot of work currently being done in the field regarding the importance of weight randomization in binarized models. Future studies could mathematically delve into weight randomization equations that would optimize the statistics explored in this project.
Our biggest takeaways from the project were mainly optimism for the future of binarization in deep learning practice, and the benefits it could have up to and including environmental accountability. We are very glad to have gotten practice and education in understanding how this novel concept works, how to implement it, and seeing firsthand how it stacks up against the conventional models that we have been using thus far in our experience with deep learning.

Log in or sign up for Devpost to join the conversation.

Arvind Sridhar started this project — Nov 12, 2021 05:35 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.