OSS at Night

Inspiration

The moment gpt-oss-20b was released, I had it running locally. The power was undeniable, but so was the reality: even on powerful hardware, large local models are slow. A single complex query could lock up my terminal for minutes, making interactive work frustrating.

Instead of seeing this as a limitation, I saw an opportunity. What if the slowness didn't matter? What if you could decouple the creative act of assigning a task from the computational act of executing it?

This led to the core idea behind OSS at Night: a personal, private, overnight AI assistant that turns your machine's downtime into its most productive time. It was born directly from the experience of using gpt-oss and designing a workflow that embraces its strengths (powerful reasoning, 100% privacy) while completely neutralizing its main weakness (speed).

What it does

OSS at Night is an asynchronous AI workload manager that lets you queue up complex tasks from any device during the day and have your local gpt-oss model process them overnight. You wake up to a folder full of completed work—research reports, generated code, summarized articles, and more.

  • Queue & Forget: Add tasks via a powerful CLI for automation or a simple, mobile-friendly web GUI from your phone, tablet, or laptop.

  • Process Overnight: When you're done for the day, a single command starts the processor. It works through the queue task-by-task, with no timeouts.

  • Agentic Workflows: Using simple YAML files, it can execute complex, multi-step tasks where the output of one step (like a web search) becomes the input for the next (like a summary), all powered by the reasoning of gpt-oss.

  • 100% Local & Private: Your data and tasks never leave your local network, ensuring complete privacy.

How I built it

The project was built in three logical layers, with gpt-oss being a core component of both the architecture and the development process itself.

The Foundation (CLI): I started with a robust Python CLI (obp-CLI.py) to handle the core logic: task parsing, queuing, and processing. The heart of the system is the customizable YAML-based task engine, which allows for creating sophisticated, multi-step agentic workflows. State is managed reliably in an SQLite database, making the system fault-tolerant.

The Accessibility Layer (GUI): To make the tool accessible to everyone, I built a clean web interface using Flask (obp-GUI.py). It runs a local server, making the queue accessible from any device on the network. A key feature here is the AI-powered task interpreter: I use gpt-oss itself to parse natural language requests from the user (e.g., "research the latest in AI and write a report") and convert them into structured JSON tasks with the correct type and metadata.

The Deployment Layer (Docker): To ensure easy and reliable deployment for others, the entire application was containerized using Docker and Docker Compose. This encapsulates all dependencies and makes setup a one-command process.

Challenges we ran into

Structured Output from LLMs: Getting gpt-oss to reliably return clean JSON for the task interpreter was tricky. I solved this by developing a highly-specific system prompt that constrains the model's output, along with fallback logic in the Python backend to handle cases where it still makes a mistake.

Long-Running, Unstable Processes: An overnight job can't just crash. To ensure reliability, I designed the system to be stateless. Every task's progress is committed to an SQLite database after each step, so if the process fails, it can be restarted and will pick up exactly where it left off.

Network Accessibility: Making the GUI truly "just work" on any device was a challenge due to firewalls and network configurations. I solved this by building in a network diagnostic script (network_test.py) and providing clear startup instructions and troubleshooting steps for users.

Accomplishments that we're proud of

Turning a Bug into a Feature: My proudest accomplishment is the core concept itself—transforming the "slowness" of local LLMs from a frustrating bug into a powerful feature for asynchronous, deep work.

The AI Interpreter: Using gpt-oss to power its own task delegation is a perfect demonstration of the model's reasoning capabilities. It's not just executing tasks; it's helping to define them. This is a huge win for the "Application of gpt-oss" criterion.

The Dual Interface: Creating a system that is equally loved by power users in the terminal and casual users on their phones. It shows a commitment to flexible and thoughtful design.

True Local Agent: This isn't just a script; it's a self-contained, agentic system that can perform research, write content, and generate code entirely on a local machine, delivering finished products without supervision.

What we learned

Local Models Demand New UX Paradigms: You can't just wrap a local model in a standard chatbot UI and expect a good experience. Their unique properties (slowness, privacy, free inference) require entirely new workflows. Asynchronous processing is a perfect fit.

The Power of Decoupling: Separating task definition from task execution is incredibly liberating. It allows for more thoughtful and ambitious requests, as you're not waiting for an immediate response.

The "Last Mile" is UX: A powerful backend is only useful if people can access it. The effort to build the GUI, make it mobile-friendly, and containerize it with Docker was crucial to making the project truly useful.

What's next for OSS at Night

This project has incredible potential as a platform for a fully autonomous, local AI agent.

Task Dependencies & Chaining: I plan to allow tasks in the queue to depend on one another, creating complex Directed Acyclic Graphs (DAGs) of work. (e.g., "Do Task A, then if it succeeds, do Task B and C in parallel").

Scheduled & Triggered Jobs: Add support for running the queue on a schedule (e.g., every night at 2 AM) or based on triggers (e.g., "whenever a new file is added to this folder").

Enhanced Results Visualization: Building more powerful tools into the /gallery view to analyze, compare, and synthesize the results of multiple tasks over time.

Built With

Share this project:

Updates