Inspiration

As lin alg veterans, we wanted to better understand how matrix multiplication works at the hardware level instead of relying on high-level software operations. Our goal was to design a system that reflects real digital hardware concepts such as datapaths, control logic, and arithmetic units.

What it does

Our project implements a hardware-based 3×3 matrix multiplier. It serially loads matrix values, performs iterative multiply-accumulate operations, and outputs the resulting matrix.

How we built it

We designed the system in Verilog and SystemVerilog using a structured datapath approach. A finite state machine (FSM) manages input loading, computation sequencing, and output generation.

Challenges we ran into

Bit-width management and overflow handling were major challenges, especially as intermediate values grew during accumulation. We also had to flatten the unpacked arrays into several 1D arrays. As beginner ASIC programmers, we needed to figure out the workflow, using test benches, SystemVerilog, and Cognichip.

Accomplishments we’re proud of

We successfully implemented matrix multiplication entirely in hardware. Achieving correct simulation results after debugging and verification was a major milestone.

What we learned

Through this project, we gained a deeper understanding of RTL design, datapath and control separation, bit growth in arithmetic circuits, and how hardware execution differs from software.

What’s next

We plan to scale the design to support larger matrices, improve throughput (potentially using a systolic array architecture similar to those used in accelerators like NVIDIA GPUs), and optimize resource utilization.

Built With

  • systemverilog
  • verilog
Share this project:

Updates