Inspiration
I’ve always felt like most AI assistants are impressive but kind of fake. They don’t actually know you. They don’t know how you talk, what you care about, or how you respond to things. They just autocomplete in a vacuum.
I wanted something that actually understands me — but without shipping my private conversations to the cloud.
So Jarvis started as a simple question: what if I could build a local, privacy-first AI assistant that actually learns from my real conversations and retrieves relevant context before generating anything?
Also honestly I just wanted to try building something hard. I didn’t fully know how to do it. That was part of the point.
What it does
What it does
Jarvis is a local AI assistant that:
Analyzes past conversations
Retrieves relevant context using hybrid search
Reranks results with a cross-encoder
Generates responses that match tone and intent
Runs locally instead of defaulting to cloud APIs
Instead of just “LLM generate reply,” it’s:
retrieve → rerank → inject context → generate
It tries to feel less generic and more aligned with how I actually communicate.
How we built it
Jarvis is built as a structured pipeline.
Messages are segmented into structured units
Each message is labeled (question, commitment, reaction, etc.)
Embeddings are generated locally
Hybrid retrieval runs (BM25 + vector similarity)
A cross-encoder reranks the candidates
The best context is injected into a local model for generation
Retrieval scoring is roughly:
Score ( 𝑐
)
𝛼 ⋅ BM25 ( 𝑐 ) + 𝛽 ⋅ EmbeddingSim ( 𝑐 ) Score(c)=α⋅BM25(c)+β⋅EmbeddingSim(c)
Then reranked for semantic precision.
It’s less about fancy prompts and more about designing the system correctly.
Everything was built under hardware constraints (8GB RAM), so model size, routing, and latency actually mattered.
Challenges we ran into
A lot.
Tried models too large for my machine
Built retrieval pipelines that didn’t actually improve metrics
Fought latency constantly
Realized reranking isn’t magic
Had to rethink architecture multiple times
Working with limited RAM forced discipline. Every model choice mattered. Every token mattered. There’s no hiding behind a giant 70B model.
Also evaluation is harder than generation. Measuring whether you’re actually improving something is way more annoying than just getting output.
Accomplishments that we're proud of
Fully local-first architecture
Hybrid retrieval working reliably
Cross-encoder reranking integrated and measurable
Structured labeling pipeline for message intent
Real evaluation metrics instead of vibes
Jarvis isn’t just a chatbot wrapper. It’s a system with measurable retrieval and generation improvements.
And I built it from scratch while figuring things out as I went.
What we learned
Retrieval quality > prompt cleverness. Smaller models + better routing can outperform brute force scaling. Evaluation frameworks are non-optional. Hardware constraints make you a better system designer.
Also building something when you don’t fully know what you’re doing is frustrating but extremely educational.
What's next for Jarvis
What's next for Jarvis
Stronger long-term memory abstraction
Cleaner model routing
Lower latency generation
More formal benchmarking
Expanding beyond messaging use cases
Jarvis is still evolving. The goal isn’t just a chatbot — it’s building deeply personal, privacy-aligned AI systems that actually understand context instead of pretending to.
Log in or sign up for Devpost to join the conversation.