🚀 Inspiration

As AI and image processing workloads continue to grow, so does their energy consumption. Modern CPUs are flexible but inefficient for highly parallel, repetitive tasks like convolution. We wanted to explore whether hardware acceleration on an FPGA could demonstrate measurable improvements in speed and efficiency — even for a classic algorithm like Sobel edge detection.

⚙️ What it does

Our project implements a fully streaming, pipelined Sobel edge detection accelerator on the DE1-SoC FPGA. -reads a 240×240 grayscale image from on-chip memory -Processes 1 pixel per clock using a 3×3 sliding window -Computes gradient magnitude using a pipelined Sobel architecture -Stores the processed output back into on-chip memory -Compares execution time against a CPU implementation

🛠️ How we built it

Hardware (FPGA Side)

-Implemented in SystemVerilog -Designed a 3×3 window generator using dual line buffers -Built a pipelined Sobel compute unit (Gx, Gy, absolute value, saturation) -Achieved sustained throughput of 1 pixel per clock -Used on-chip BRAM for input ROM and output memory

Software (CPU Side)

-Implemented identical Sobel algorithm in C -Measured execution time over the same 240×240 image -Used cycle counting for fair comparison -System Integration -Embedded image data into FPGA ROM via MIF -Used hardware counters to track FPGA execution cycles -Compared runtime between CPU and FPGA implementations

⚡ Challenges we ran into

-understanding logic circuits -Designing a correct streaming 3×3 window architecture -Aligning valid signals with pipeline latency -Managing boundary conditions (no border pixels) -Understanding how ROM initialization differs between simulation and hardware -Debugging synthesis-specific issues in Quartus (currently working on it)

🏆 Accomplishments that we're proud of

-Built a fully synthesizable hardware pipeline -Achieved 1 pixel per clock throughput -Successfully embedded image memory into FPGA bitstream -Designed a clean hardware/software comparison framework -Demonstrated measurable performance difference -Learned practical FPGA system design beyond simulation

What we learned

-Why parallel hardware execution outperforms sequential CPU execution for convolution tasks -How streaming architectures eliminate unnecessary memory bottlenecks -The importance of memory architecture in acceleration -How hardware/software co-design works in real systems -How CPU and FPGA fundamentally differ in execution philosophy(logic circuits)

What's next for Sobel edge detection

-Measure real power consumption vs CPU -Add dynamic image loading from HPS -Implement Gaussian blur + Canny edge detection -Explore fixed-point optimization -Compare FPGA vs GPU

Long-term vision: Use this as a foundation to explore energy-efficient AI hardware acceleration.

Built With

  • c
  • systemverilog
Share this project:

Updates