Inspiration

I knew I wanted to do a hardware project, and I'm interested in ASIC and heterogeneous computing. I wanted to show how effective and energy efficient application specific hardware can be. This design lacks all the extra control logic that a CPU has, making it significantly more energy efficient.

System overview

This system performs a Laplace convolution of an input to detect edges. The image is sent from the host computer to the FPGA, which performs the convolution and sends it back to the host. Convolutions are the key operation in CNNs, the state of the art model for image recognition, so with some more work this system could be an AI accelerator. AI consumes a significant amount of energy, and with the surging demand for AI products it is incredibly important to think about how we can make AI sustainable.

How I built it

The system consists of a host (my laptop) which takes an image, serializes it, and sends it to the FPGA via USB. The FPGA has implemented a few state machines to organize the data and the convolution unit to process it. The host program runs python, and the FPGA implementation is in SystemVerilog.

Challenges we ran into

I started the FPGA component in Verilog, and quickly realized there is a reason people made new HDLs. SystemVerilog is more powerful and flexible, but it doesn't support 3D RAM, so the synthesis was incorrect. I spent a significant portion of time changing my syntax so the netlist would generate properly. The FPGA I used is a low end product, so it is only able to process an 8x8 image. I originally wanted to do a 14 by 14 image, but even that consumed more block ram than was available. Luckily I anticipated this and parameterized all of my modules so it was easy to change to a smaller design. Unfortunately I was unable to fix the final synthesis errors in my image to UART module, so it the system is not functional.

Accomplishments that I'm proud of

I'm proud of the sheer number amount of code I churned out. Before this I had barely used behavioral Verilog, and I wrote around a thousand lines of it. I'm also very proud of how modular my code is. All of my modules are parametrized so they can be reused in other projects.

What I learned

Don't use someone's open source code without testing it!

What's next for the Edge Detection Accelerator

To be an effective computing system this design needs many changes. Using a faster protocol like SPI or PCIe and a larger FPGA are at the top of the list. With more resources, I would also add a configurable filter so it can be used for other image processing algorithms and CNNs.

Built With

Share this project:

Updates