About Jarvis — Local Agent
Main task: explore the capabilities and trade-offs of the GPT-OSS 20B / 120B models inside a practical, privacy-first local agent.
Inspiration
Build an assistant that can work with your files locally, surface insights, and act as a trustworthy helper—useful for individuals and for teams that must keep internal data private. We also wanted a clean way to switch providers (OpenRouter, HF Router, Ollama) without rewriting code, and to learn where big OSS models shine or struggle.
What it does
- Folder sandbox: choose an Action Folder; all operations are constrained to it.
- Six actions: Explore folder, Analyze documents, Translate, Extract data, Generate new document (e.g., README), and Batch actions (shell).
- Smart tools (when supported): list/search files, read txt/pdf/docx/xlsx/csv, count by type, write files, optional shell, OCR via Tesseract (with Poppler for PDFs).
- Safe mode & step limit: shell off by default; slider caps agent steps.
- Multi-provider switch: OpenRouter / HF Router / local Ollama (OpenAI-compatible
/v1) with per-backend model mapping and.envkeys. - When tools aren’t supported: Jarvis still works in paste-in mode—you paste a file list, document text, or OCR output into chat, and Jarvis summarizes, analyzes, translates, or structures it. Use the Save action to write results.
How we built it
- UI: Python + Gradio (dark theme), file explorer, “Apply settings,” presets for the six actions, warnings if tool-calling isn’t available.
- Agent: HuggingFace smolagents
CodeAgent,LiteLLMModel, plus an OpenAI-compatible client for HF Router. - Providers:
- OpenRouter (
https://openrouter.ai/api/v1) - HF Router (
https://router.huggingface.co/v1, provider-suffixed model IDs like…:fireworks-ai) - Ollama via OpenAI shim (
http://localhost:11434/v1)
- OpenRouter (
- Config: single
MODEL_BY_BACKENDmap;TOOLS_CAPABLEtoggles tool-calls per backend; agent cache keyed by backend/model/safety. - Docs & data: resilient encoding (UTF-8, CP1251/866, etc.), delimiter detection for CSV, quick previews for Excel/PDF, OCR for scans.
Challenges we ran into
- Tool-calling variance: many OSS models (incl.
gpt-oss-20b) chat well but don’t emit structured OpenAItool_calls. Mitigation: clearly document paste-in mode when tools aren’t available. - Endpoint gotchas: Ollama native vs
/v1OpenAI path; HF Router provider suffix; credentials andusagefields. - Framework quirks: smolagents API changes, Gradio state deep-copy rules, file-explorer/global state fixes.
- Windows specifics: code pages & Cyrillic encoding, Tesseract & Poppler paths.
- Hardware limits: local 120B is impractical on our machine; used APIs for that scale.
Accomplishments that we’re proud of
- A working local agent with a clean UI, cross-provider support, and a safe sandbox.
- OCR integration, resilient readers, and practical presets that deliver value fast.
- Clear step-limit and logs so advanced users can inspect behavior.
What we learned
- Provider differences matter: the same model name can require different IDs/adapters.
- Designing for graceful fallback (to paste-in mode) keeps the app useful across providers.
- Small UX choices (safe defaults, previews, explicit outputs like
reports/*.md) increase trust. - Centralized config prevents cross-provider “default” mixups.
What’s next for Local agent Jarvis
- UI polish: action wizards, richer previews, per-action options (glob/regex/lang).
- Docs & blog: write up the architecture and provider nuances; public demo.
- Plugins: table extraction from PDFs, diff/review helpers, one-click exports.
- RAG option: local embeddings + vector search for large folders (privacy-first).
- Packaging: installer, thumbnails/icons, and an offline bundle.
- Model ops: optional tool-friendly models (Functionary/Qwen-Instruct) for users who want full agentic workflows locally.
Built With
- gradio
- litellmmodel
- python
- smolagents
Log in or sign up for Devpost to join the conversation.