Inspiration

I’ve always believed that the richer the context you give an LLM, the better the answers you get. Many times, I’ve thought: “The LLM could solve this if only it knew what I was seeing or doing.” That idea pushed me to build a system that provides the right context automatically—by letting the model “see” what I see.

What it does

Omni Chat creates a private, continuously-growing knowledge base from your computer activity. Using Retrieval-Augmented Generation (RAG), it answers context-aware questions based on what you were actually working on. It runs fully locally (model + OCR), captures screenshots intelligently, extracts text, indexes it into a vector store, and delivers chat responses that are grounded in your real activity.

How we built it

Omni Chat combines a local LLM with OCR to intelligently process screenshots of user activity. Extracted text is stored in a vector database (ChromaDB) via LlamaIndex, enabling RAG-powered conversations. The system includes a responsive chat interface for natural interactions.

Tech Stack Highlights

  • Backend: Python, FastAPI
  • Frontend: Next.js, ChatUI
  • Local AI Model: gpt-oss-20b via Ollama
  • OCR: Tesseract
  • Vector Database: ChromaDB
  • Embeddings: HuggingFace BAAI/bge-small-en-v1.5
  • Containerization: Docker (optional)
  • OS Monitoring: macOS activity client

Challenges we ran into

  • The first version directly passed screenshots to the model without OCR. It was painfully slow and drained laptop resources. Adding OCR fixed this.
  • Queries requiring large context (e.g., “Summarize what I did yesterday”) don’t work well yet, since they need the entire day’s activity log. I’ll need custom summarization logic for that.
  • At one point, I tried sending OCR results to GPT-20B for summarization before storage, but it drained battery and lost context. The final approach was storing raw OCR text in the vector store.
  • Even on a 36GB MacBook Pro, running the local model eats up significant resources.
  • Multi-monitor support is still unreliable.

Accomplishments we’re proud of

  • It’s genuinely useful: it answers many queries correctly and already makes daily work easier.
  • I managed to add a full UI layer just a day before the hackathon—thanks to reusing the LlamaIndex chat interface.

What we learned

  • Understanding of RAG, beyond theory.
  • Hands-on experience with hosting and managing Docker containers.
  • How to orchestrate pipelines effectively using the LlamaIndex framework.

What’s next for Omni Chat

  • Improving RAG to handle larger, more complex queries and deliver better context-aware responses.
  • Launching as a real product beyond the hackathon prototype.
  • Aggregate queries, e.g., “Summarize everything I did this week” across multiple days and activities.
  • Voice modality: enabling persistent listening through your phone or laptop and feeding that audio context into the vector store.

Built With

Share this project:

Updates