My MongoDB RAG Journey

The Idea 💡

Late one night, I found the "All The News 2.0" dataset on Kaggle - 25,000+ articles just waiting to be explored. I thought: "What if I could build something that actually understands these articles, not just searches them?"

The Reality Check 😅

I knew nothing about vector embeddings beyond "they're numbers that represent text somehow." But that's how you learn, right?

The hardest part wasn't the individual pieces - it was connecting them all. Documents → Embeddings → Storage → Search → Chat. Each step seemed simple until I tried linking them together.

The Struggles

  • GitLab CI/CD took three attempts (authentication is tricky)
  • I spent a weekend debugging why search returned nothing (OpenAI vs Vertex AI embeddings don't mix)
  • Almost committed MongoDB credentials to GitHub (close call!)
  • Frontend confusion: GET vs POST still gets me sometimes

The Victory 🏆

The best moment? When I asked the chat interface "What were the main political stories in 2016?" and it gave me a coherent answer with proper citations. That's when I realized I'd built something that could actually understand information.

Built with curiosity, caffeine, and Stack Overflow.

Built With

Share this project:

Updates