Inspiration
I was inspired by Louis Guthmann at EthCC who said that even the biggest Zero-knowledge companies do not use GPU's to accelerate their Proofs. I thought that we can easily change that by executing the slowest (and highly parallelizable) on a graphics card.
What it does
GPU-Snark executes the Fast Fourier Transformation on Finite Elements for Snarks. It is several times faster than the existing CPU implementation Libsnark (used by ZCash). It is fully templated, so it can calculate different datatypes. The Interface matches the interface of Libsnark and Bellman which makes it easy to use GPU-Snark instead of Libfft for the fourier transformation. It can also be used for DIZK The implementation is in CUDA so it can only run on NVIDIA Graphics Cards.
How I built it & Challenges I ran into
I've started Friday evening and build the first version without compiling it since I do not have a Nvidia GPU in my Laptop. I tried to set up a AWS Student account to test it in the cloud but Amazon was a bit slow. In the end I got access to a university server to test and benchmark GPU-Snark.
What's next for GPU_Snarks
Next Steps for GPU_Snarks are cleanup of the code, with more time I can probably make it even more efficient, Proper integration into [Libsnark] and Bellman. Porting the code to OpenCL. Proper Open Source Licensing. Looking for other parts in Snarks that can be executed in a highly parallel fashion.
Benchmarks
My benchmarks were run on a CPU Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz runs with 8 threads
GPU: GTX 1080
Constraints | Constraints | GPU | CPU |
---|---|---|---|
2^16 | 65536 | 0.32 s | 0.21 s |
2^20 | 1048576 | 0.33 s | 4.40 s |
2^22 | 4194304 | 0.43 s | 18.03 s |
2^25 | 33554432 | 10.57 s | tbd* |
2^28 | 268435456 | 227.25 s | tbd* |
2^28 the CPU implementation was not finished after over 4 hours on 8 cores. Which suggests a speedup of over *60x** for the GPU implementation.
2^28 took around 1.8GB of Memory on the GPU
Log in or sign up for Devpost to join the conversation.