Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Untitled

  1. Inspiration

This project was inspired by a sense of unease about the "black box" of programming. In my daily work, I'm used to calling high-level libraries like TensorFlow or PyTorch. A few lines of code like model.fit() work wonders.

But I realized I was gradually forgetting the underlying mathematical principles. Richard Feynman's quote, "If you can't create it from scratch, you don't really understand it," became my biggest motivation. I wanted to rediscover the feel for algorithms by building a Multilayer Perceptron (MLP) capable of recognizing handwritten digits (MNIST dataset) using only NumPy, without relying on any deep learning framework.

  1. What I Learned

In this process, I gained more than just improved coding skills; I gained a deeper understanding of the theory:

The Importance of Matrix Dimensions: 90% of coding errors in formula derivation stem from matrix shape mismatches.

Activation Function Selection: I gained a profound understanding of why ReLU is more popular than Sigmoid in deep networks (it solves the vanishing gradient problem).

Mapping Mathematics to Code: I learned how to transform the abstract chain rule of calculus into efficient, working code.

  1. How I Build It

The core of the project lies in translating mathematical derivations into program logic. I divided it into three main phases:

Phase 1: Forward Propagation

This is the process by which the neural network makes predictions. For each layer, I need to calculate a linearly weighted sum and then pass it through an activation function. Assuming the input to the first layer is a [l-1], the weight is W[l], and the bias is b[l], then the linear output Z[l] is:

Z[l] = W[l] ⋅ a [l-1] + b[l]

Then, apply the activation function g(⋅) (e.g., ReLU or Sigmoid):

a[l] = g(Z[l])

Second Stage: Defining the Loss Function

To simplify the model representation, I used the Cross-Entropy Loss function. For a binary classification problem, assuming the true label is and the predicted value is ^, the loss L for a single sample is defined as:

L(y, y^) = −(y log(y^) + (1) − y) log (1 − y^)

For the entire training set (m samples), the cost function J is the solvent of the loss for all samples:

J(W, b) = m1i = 1∑mL(y)(w), y^(w))

Third stage: Backpropagation

This is the most challenging part of the project. To update the weights to minimize the loss, I need to calculate the minimum of the cost function relative to the parameters. The chain rule, output layer, and gradient derivation are as follows:

∂Z[l]∂L=a[l]− is

With dZ[l], I can calculate the gradients of the weights and biases:

dW[l]=m1​dZ[l]⋅(A)[l−1])T

db[l]=m1​i=1∑m​dZ[l](i)

Finally, the parameters are updated using gradient descent, where α is the learning rate:

W[l]=W[l]−α⋅dW[l]

  1. Challenges Encountered

During the construction process, I encountered some tricky problems:

Vanishing Gradient: I used the Sigmoid activation function throughout the network. As the number of layers increases, the parameters of the earlier layers are hardly updated.

Solution: Change the activation function of the hidden layer to ReLU (g(z) = max(0, z)), and only retain the Sigmoid function in the output layer.

Numerical Stability: When calculating the logarithm, if the predicted value is very close to 0 or 1, it will lead to NaN (Not a Number) errors.

Solution: Add a minimum value ϵ (e.g., 1e−8) to the logarithm calculation, i.e., log (where ϵ is ^ + ϵ).

Pitfalls of Matrix Broadcasting Mechanism: Python's broadcasting mechanism is very powerful, but it is also prone to a series of bugs. For example, adding the number of queues (n,) to the backing of (n, 1) may not yield the expected result.

Solution: Force explicit use of .reshape() for all dimensions of the critical matrices and add assertion statements for assertion checks. (From Gemini)

Built With

  • 1.-core-model-architecture-gemini-3-pro:-responsible-for-understanding-complex-instructions
  • 5.
  • a
  • and
  • and-network-bandwidth-required-for-large-scale-training.-4.-architecture-design-transformer-architecture:-based-on-the-transformer-architecture
  • and-processing-multimodal-inputs-(text
  • and-style-transfer.-2.-deep-learning-frameworks-&-languages-jax-/-tensorflow:-google's-model-training-often-heavily-relies-on-jax.-jax-is-a-python-library-designed-for-high-performance-score-computation
  • attention
  • based
  • combining
  • deep
  • deployment
  • enabling-the-compilation-of-numpy-code-to-accelerators-using-xla-(accelerated-linear-algebra).-python:-the-primary-interface-language-for-artificial-intelligence-research-and-development.-c++:-used-for-low-level-performance-optimization
  • encoding
  • ensuring-sufficiently-fast-inference-speeds-on-the-server-side.-3.-computing-infrastructure-&-cloud:-relies-on-google's-vast-cloud-computing-resources:-data-centers:-google's-global-network-of-data-centers-provides-the-computing-power
  • facts
  • google
  • google's
  • graph
  • graph:
  • image-editing
  • images
  • inference
  • information
  • it
  • knowledge
  • mechanism.
  • network
  • neural
  • obtain
  • on
  • real-time
  • reduce
  • search
  • serving
  • storage
  • the
  • to
  • uses
  • utilizing
  • verify
  • video).-nano-banana:-a-state-of-the-art-model-specifically-designed-for-image-generation-and-editing.-it-supports-text-to-image
  • with
Share this project:

Updates