Inspiration
As lin alg veterans, we wanted to better understand how matrix multiplication works at the hardware level instead of relying on high-level software operations. Our goal was to design a system that reflects real digital hardware concepts such as datapaths, control logic, and arithmetic units.
What it does
Our project implements a hardware-based 3×3 matrix multiplier. It serially loads matrix values, performs iterative multiply-accumulate operations, and outputs the resulting matrix.
How we built it
We designed the system in Verilog and SystemVerilog using a structured datapath approach. A finite state machine (FSM) manages input loading, computation sequencing, and output generation.
Challenges we ran into
Bit-width management and overflow handling were major challenges, especially as intermediate values grew during accumulation. We also had to flatten the unpacked arrays into several 1D arrays. As beginner ASIC programmers, we needed to figure out the workflow, using test benches, SystemVerilog, and Cognichip.
Accomplishments we’re proud of
We successfully implemented matrix multiplication entirely in hardware. Achieving correct simulation results after debugging and verification was a major milestone.
What we learned
Through this project, we gained a deeper understanding of RTL design, datapath and control separation, bit growth in arithmetic circuits, and how hardware execution differs from software.
What’s next
We plan to scale the design to support larger matrices, improve throughput (potentially using a systolic array architecture similar to those used in accelerators like NVIDIA GPUs), and optimize resource utilization.
Built With
- systemverilog
- verilog
Log in or sign up for Devpost to join the conversation.