listing files in the folder
creating document in the folder after analysis

About Jarvis — Local Agent

Main task: explore the capabilities and trade-offs of the GPT-OSS 20B / 120B models inside a practical, privacy-first local agent.

Inspiration

Build an assistant that can work with your files locally, surface insights, and act as a trustworthy helper—useful for individuals and for teams that must keep internal data private. We also wanted a clean way to switch providers (OpenRouter, HF Router, Ollama) without rewriting code, and to learn where big OSS models shine or struggle.

What it does

Folder sandbox: choose an Action Folder; all operations are constrained to it.
Six actions: Explore folder, Analyze documents, Translate, Extract data, Generate new document (e.g., README), and Batch actions (shell).
Smart tools (when supported): list/search files, read txt/pdf/docx/xlsx/csv, count by type, write files, optional shell, OCR via Tesseract (with Poppler for PDFs).
Safe mode & step limit: shell off by default; slider caps agent steps.
Multi-provider switch: OpenRouter / HF Router / local Ollama (OpenAI-compatible /v1) with per-backend model mapping and .env keys.
When tools aren’t supported: Jarvis still works in paste-in mode—you paste a file list, document text, or OCR output into chat, and Jarvis summarizes, analyzes, translates, or structures it. Use the Save action to write results.

How we built it

UI: Python + Gradio (dark theme), file explorer, “Apply settings,” presets for the six actions, warnings if tool-calling isn’t available.
Agent: HuggingFace smolagents CodeAgent, LiteLLMModel, plus an OpenAI-compatible client for HF Router.
Providers:
- OpenRouter (https://openrouter.ai/api/v1)
- HF Router (https://router.huggingface.co/v1, provider-suffixed model IDs like …:fireworks-ai)
- Ollama via OpenAI shim (http://localhost:11434/v1)
Config: single MODEL_BY_BACKEND map; TOOLS_CAPABLE toggles tool-calls per backend; agent cache keyed by backend/model/safety.
Docs & data: resilient encoding (UTF-8, CP1251/866, etc.), delimiter detection for CSV, quick previews for Excel/PDF, OCR for scans.

Challenges we ran into

Tool-calling variance: many OSS models (incl. gpt-oss-20b) chat well but don’t emit structured OpenAI tool_calls. Mitigation: clearly document paste-in mode when tools aren’t available.
Endpoint gotchas: Ollama native vs /v1 OpenAI path; HF Router provider suffix; credentials and usage fields.
Framework quirks: smolagents API changes, Gradio state deep-copy rules, file-explorer/global state fixes.
Windows specifics: code pages & Cyrillic encoding, Tesseract & Poppler paths.
Hardware limits: local 120B is impractical on our machine; used APIs for that scale.

Accomplishments that we’re proud of

A working local agent with a clean UI, cross-provider support, and a safe sandbox.
OCR integration, resilient readers, and practical presets that deliver value fast.
Clear step-limit and logs so advanced users can inspect behavior.

What we learned

Provider differences matter: the same model name can require different IDs/adapters.
Designing for graceful fallback (to paste-in mode) keeps the app useful across providers.
Small UX choices (safe defaults, previews, explicit outputs like reports/*.md) increase trust.
Centralized config prevents cross-provider “default” mixups.

What’s next for Local agent Jarvis

UI polish: action wizards, richer previews, per-action options (glob/regex/lang).
Docs & blog: write up the architecture and provider nuances; public demo.
Plugins: table extraction from PDFs, diff/review helpers, one-click exports.
RAG option: local embeddings + vector search for large folders (privacy-first).
Packaging: installer, thumbnails/icons, and an offline bundle.
Model ops: optional tool-friendly models (Functionary/Qwen-Instruct) for users who want full agentic workflows locally.

Built With

gradio
litellmmodel
python
smolagents

Updates

Uriel Kaiia started this project — Aug 29, 2025 08:13 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.