Intro

A cross-platform, on-device multimodal chatbot with RAG, capable of processing six types of documents. It features a built-in data analyst agent and OCR, and is fully deployable on any local machine with a single script. Note: KnowFlow supports both online and offline modes, and runs smoothly on macOS, Windows, and Linux. It can be installed easily using the included setup.sh script.
In the demo video, the audio responses generated by the model are not audible because the screen recorder did not capture system audio during recording.

Inspiration

The idea for KnowFlow came from a simple but important belief: AI should be accessible, private, and usable even without internet. With the rise of large language models, most tools rely on cloud access, which raises privacy concerns and limits use on low-resource or offline systems. I wanted to build something different — an AI assistant that runs fully offline, supports multimodal input, and works across platforms. That is how KnowFlow was born.

What it does

KnowFlow is a cross-platform, on-device AI assistant that provides the following capabilities

  • Chat with both cloud and local large language models
  • Document-based question answering using retrieval-augmented generation
  • CSV file analysis and visualization using a built-in data analyst agent
  • OCR to extract and understand text from images
  • Image generation from text prompts
  • Visual question answering using the device camera
  • Live drawing assistant with voice interaction
  • Web search using either Tavily or Playwright depending on mode
  • Works in both online and offline modes
  • Runs on macOS, Windows, and Linux

How I built it

Model Integration

  • Used Google's Gemini Flash model for online chat
  • Used Meta's Llama 3.2 3B model (4-bit) via Ollama for local offline inference
  • Responses are formatted in Markdown with code highlighting
  • Chat history is stored locally in SQLite with semantic search

RAG Pipeline

  • Used LlamaIndex to parse and chunk files
  • Generated embeddings using all-MiniLM-L6-v2 from Hugging Face
  • Stored embeddings in ChromaDB for fast similarity search
  • Supported file types include PDF, DOCX, TXT, CSV, HTML, and JSON

Modular Agent System

Each AI feature is implemented as a separate agent

  • TextAgent for general chat
  • WebAgent for information retrieval
    • Two options: Playwright to scrape relevant content or Tavily API for fast, clean search results
  • RagAgent for document-based answering
  • ImageGenAgent for text-to-image creation
  • LocalAgent for Llama model inference
  • LiveAgent for drawing and voice interaction
  • ObjectDetectionAgent for interpreting visual inputs
  • DataFrameAgent for analyzing and plotting data from CSV files

Backend and Infrastructure

  • FastAPI was used for building all REST and WebSocket endpoints
  • Python 3.11 is used throughout the backend
  • WebSockets were used for real-time interactions like drawing and voice feedback
  • Async handlers were implemented for responsive performance

Frontend and Integrations

  • TailwindCSS and JavaScript were used for the frontend
  • Google Cloud TTS was used for voice feedback
  • Hugging Face was used for embedding and model access
  • Camera and microphone integration support live visual Q and A
  • Drawing canvas supports real-time sketching with AI guidance

Challenges I ran into

  • Managing memory for local inference of Llama 3.2 while maintaining speed
  • Chunking documents in a way that preserved meaning without bloating token size
  • Creating fallbacks for file parsing errors
  • Ensuring one-click install across macOS, Windows, and Linux
  • Building an intuitive and unified UI across multiple modalities

Accomplishments that I am proud of

  • Built a working on-device AI assistant with no internet dependency
  • Achieved local RAG across six file types
  • Integrated real-time drawing, OCR, and visual Q and A into one platform
  • Designed a single-script installation method for all major operating systems
  • Combined multiple GenAI features into one extensible tool

What I learned

  • Combining online and offline AI models provides real flexibility
  • Local RAG can be efficient and accurate with proper chunking and embeddings
  • Modular architecture is essential when scaling a multimodal app
  • Real-time interfaces like voice and drawing make AI more usable
  • Offline tools can still match enterprise-level performance with the right stack

What is next for KnowFlow

  • Add local speech recognition and text to speech using open source models
  • Support new file types like XML and EPUB
  • Build a lightweight desktop GUI using Electron or Tauri
  • Improve dynamic switching between online and offline models
  • Add user login for multi-profile local usage
  • Open source the CSV analyst agent as a standalone library

Built With

Share this project:

Updates