Inspiration

In ECE342, we studied floating-point representation, normalization, rounding, and how arithmetic is implemented at the hardware level. While we understood the theory of IEEE-style floating point formats, we wanted to go deeper and build a working floating-point unit ourselves.

Instead of implementing full IEEE-754 single precision, we designed a simplified 8-bit floating point format (E3M4) to better understand how exponent alignment, mantissa arithmetic, normalization, and rounding are actually handled in hardware.

What it does

Our design implements a custom 8-bit floating point unit using:

Format: [S][EEE][MMMM]

1 sign bit, 3 exponent bits (bias = 3), 4 mantissa bits

The FPU supports: Addition Subtraction Multiplication Division Fused Multiply-Add (FMA) Square Root

All outputs are normalized and rounded back into E3M4 format after computation.

How we built it

The architecture is divided into modular stages:

Unpack stage: Extract sign, exponent, and mantissa. Insert hidden leading 1 for normalized numbers.

Arithmetic core: Add/Sub: exponent alignment, significand addition/subtraction Mul: exponent addition, significand multiplication Div: exponent subtraction, iterative mantissa division FMA: full-precision multiply followed by aligned accumulation Sqrt: exponent halving and iterative mantissa square root

Normalization stage: Results are shifted to restore canonical form:

1.𝑀×2𝐸 Exponents are adjusted accordingly.

Rounding stage: Guard, round, and sticky bits are used to implement round-to-nearest behavior before truncating back to 4 mantissa bits.

Pack stage: Final result is encoded back into 8-bit E3M4 format.

Challenges we ran into

Limited dynamic range (exponent only 3 bits) Significant rounding error due to only 4 mantissa bits

Accomplishments that we're proud of

Successfully implemented six floating-point operations in only 8 bits Demonstrated correct normalized outputs in all supported operations

What we learned

How small bit-width designs magnify architectural decisions

What's next for 8 bit FPU

IEEE-style NaN and infinity handling Multiple rounding modes

Built With

Share this project:

Updates