Inspiration
What it does
How we built it
Challenges we ran into
Accomplishments that we're proud of
What we learned
What's next for Untitled
- Inspiration
This project was inspired by a sense of unease about the "black box" of programming. In my daily work, I'm used to calling high-level libraries like TensorFlow or PyTorch. A few lines of code like model.fit() work wonders.
But I realized I was gradually forgetting the underlying mathematical principles. Richard Feynman's quote, "If you can't create it from scratch, you don't really understand it," became my biggest motivation. I wanted to rediscover the feel for algorithms by building a Multilayer Perceptron (MLP) capable of recognizing handwritten digits (MNIST dataset) using only NumPy, without relying on any deep learning framework.
- What I Learned
In this process, I gained more than just improved coding skills; I gained a deeper understanding of the theory:
The Importance of Matrix Dimensions: 90% of coding errors in formula derivation stem from matrix shape mismatches.
Activation Function Selection: I gained a profound understanding of why ReLU is more popular than Sigmoid in deep networks (it solves the vanishing gradient problem).
Mapping Mathematics to Code: I learned how to transform the abstract chain rule of calculus into efficient, working code.
- How I Build It
The core of the project lies in translating mathematical derivations into program logic. I divided it into three main phases:
Phase 1: Forward Propagation
This is the process by which the neural network makes predictions. For each layer, I need to calculate a linearly weighted sum and then pass it through an activation function. Assuming the input to the first layer is a [l-1], the weight is W[l], and the bias is b[l], then the linear output Z[l] is:
Z[l] = W[l] ⋅ a [l-1] + b[l]
Then, apply the activation function g(⋅) (e.g., ReLU or Sigmoid):
a[l] = g(Z[l])
Second Stage: Defining the Loss Function
To simplify the model representation, I used the Cross-Entropy Loss function. For a binary classification problem, assuming the true label is and the predicted value is ^, the loss L for a single sample is defined as:
L(y, y^) = −(y log(y^) + (1) − y) log (1 − y^)
For the entire training set (m samples), the cost function J is the solvent of the loss for all samples:
J(W, b) = m1i = 1∑mL(y)(w), y^(w))
Third stage: Backpropagation
This is the most challenging part of the project. To update the weights to minimize the loss, I need to calculate the minimum of the cost function relative to the parameters. The chain rule, output layer, and gradient derivation are as follows:
∂Z[l]∂L=a[l]− is
With dZ[l], I can calculate the gradients of the weights and biases:
dW[l]=m1dZ[l]⋅(A)[l−1])T
db[l]=m1i=1∑mdZ[l](i)
Finally, the parameters are updated using gradient descent, where α is the learning rate:
W[l]=W[l]−α⋅dW[l]
- Challenges Encountered
During the construction process, I encountered some tricky problems:
Vanishing Gradient: I used the Sigmoid activation function throughout the network. As the number of layers increases, the parameters of the earlier layers are hardly updated.
Solution: Change the activation function of the hidden layer to ReLU (g(z) = max(0, z)), and only retain the Sigmoid function in the output layer.
Numerical Stability: When calculating the logarithm, if the predicted value is very close to 0 or 1, it will lead to NaN (Not a Number) errors.
Solution: Add a minimum value ϵ (e.g., 1e−8) to the logarithm calculation, i.e., log (where ϵ is ^ + ϵ).
Pitfalls of Matrix Broadcasting Mechanism: Python's broadcasting mechanism is very powerful, but it is also prone to a series of bugs. For example, adding the number of queues (n,) to the backing of (n, 1) may not yield the expected result.
Solution: Force explicit use of .reshape() for all dimensions of the critical matrices and add assertion statements for assertion checks. (From Gemini)
Built With
- 1.-core-model-architecture-gemini-3-pro:-responsible-for-understanding-complex-instructions
- 5.
- a
- and
- and-network-bandwidth-required-for-large-scale-training.-4.-architecture-design-transformer-architecture:-based-on-the-transformer-architecture
- and-processing-multimodal-inputs-(text
- and-style-transfer.-2.-deep-learning-frameworks-&-languages-jax-/-tensorflow:-google's-model-training-often-heavily-relies-on-jax.-jax-is-a-python-library-designed-for-high-performance-score-computation
- attention
- based
- combining
- deep
- deployment
- enabling-the-compilation-of-numpy-code-to-accelerators-using-xla-(accelerated-linear-algebra).-python:-the-primary-interface-language-for-artificial-intelligence-research-and-development.-c++:-used-for-low-level-performance-optimization
- encoding
- ensuring-sufficiently-fast-inference-speeds-on-the-server-side.-3.-computing-infrastructure-&-cloud:-relies-on-google's-vast-cloud-computing-resources:-data-centers:-google's-global-network-of-data-centers-provides-the-computing-power
- facts
- google's
- graph
- graph:
- image-editing
- images
- inference
- information
- it
- knowledge
- mechanism.
- network
- neural
- obtain
- on
- real-time
- reduce
- search
- serving
- storage
- the
- to
- uses
- utilizing
- verify
- video).-nano-banana:-a-state-of-the-art-model-specifically-designed-for-image-generation-and-editing.-it-supports-text-to-image
- with


Log in or sign up for Devpost to join the conversation.