About Jarvis — Local Agent

Main task: explore the capabilities and trade-offs of the GPT-OSS 20B / 120B models inside a practical, privacy-first local agent.

Inspiration

Build an assistant that can work with your files locally, surface insights, and act as a trustworthy helper—useful for individuals and for teams that must keep internal data private. We also wanted a clean way to switch providers (OpenRouter, HF Router, Ollama) without rewriting code, and to learn where big OSS models shine or struggle.

What it does

  • Folder sandbox: choose an Action Folder; all operations are constrained to it.
  • Six actions: Explore folder, Analyze documents, Translate, Extract data, Generate new document (e.g., README), and Batch actions (shell).
  • Smart tools (when supported): list/search files, read txt/pdf/docx/xlsx/csv, count by type, write files, optional shell, OCR via Tesseract (with Poppler for PDFs).
  • Safe mode & step limit: shell off by default; slider caps agent steps.
  • Multi-provider switch: OpenRouter / HF Router / local Ollama (OpenAI-compatible /v1) with per-backend model mapping and .env keys.
  • When tools aren’t supported: Jarvis still works in paste-in mode—you paste a file list, document text, or OCR output into chat, and Jarvis summarizes, analyzes, translates, or structures it. Use the Save action to write results.

How we built it

  • UI: Python + Gradio (dark theme), file explorer, “Apply settings,” presets for the six actions, warnings if tool-calling isn’t available.
  • Agent: HuggingFace smolagents CodeAgent, LiteLLMModel, plus an OpenAI-compatible client for HF Router.
  • Providers:
    • OpenRouter (https://openrouter.ai/api/v1)
    • HF Router (https://router.huggingface.co/v1, provider-suffixed model IDs like …:fireworks-ai)
    • Ollama via OpenAI shim (http://localhost:11434/v1)
  • Config: single MODEL_BY_BACKEND map; TOOLS_CAPABLE toggles tool-calls per backend; agent cache keyed by backend/model/safety.
  • Docs & data: resilient encoding (UTF-8, CP1251/866, etc.), delimiter detection for CSV, quick previews for Excel/PDF, OCR for scans.

Challenges we ran into

  • Tool-calling variance: many OSS models (incl. gpt-oss-20b) chat well but don’t emit structured OpenAI tool_calls. Mitigation: clearly document paste-in mode when tools aren’t available.
  • Endpoint gotchas: Ollama native vs /v1 OpenAI path; HF Router provider suffix; credentials and usage fields.
  • Framework quirks: smolagents API changes, Gradio state deep-copy rules, file-explorer/global state fixes.
  • Windows specifics: code pages & Cyrillic encoding, Tesseract & Poppler paths.
  • Hardware limits: local 120B is impractical on our machine; used APIs for that scale.

Accomplishments that we’re proud of

  • A working local agent with a clean UI, cross-provider support, and a safe sandbox.
  • OCR integration, resilient readers, and practical presets that deliver value fast.
  • Clear step-limit and logs so advanced users can inspect behavior.

What we learned

  • Provider differences matter: the same model name can require different IDs/adapters.
  • Designing for graceful fallback (to paste-in mode) keeps the app useful across providers.
  • Small UX choices (safe defaults, previews, explicit outputs like reports/*.md) increase trust.
  • Centralized config prevents cross-provider “default” mixups.

What’s next for Local agent Jarvis

  • UI polish: action wizards, richer previews, per-action options (glob/regex/lang).
  • Docs & blog: write up the architecture and provider nuances; public demo.
  • Plugins: table extraction from PDFs, diff/review helpers, one-click exports.
  • RAG option: local embeddings + vector search for large folders (privacy-first).
  • Packaging: installer, thumbnails/icons, and an offline bundle.
  • Model ops: optional tool-friendly models (Functionary/Qwen-Instruct) for users who want full agentic workflows locally.

Built With

  • gradio
  • litellmmodel
  • python
  • smolagents
Share this project:

Updates