Inspiration

Most AI tools trap you in a "vibe-based" chat box. We wanted to build a deterministic compute engine where the user is the Prime Director, not just a prompter. We moved away from the traditional chatbot UI toward a Research Cockpit—a structured dashboard designed to oversee autonomous agents testing real-world hypotheses.

What it does

Research Cockpit is a stateless, database-driven pipeline that automates research execution:

Orchestrator (The Dispatcher): A master agent that evaluates the Project DB, dispatches parallel tasks, and reflects on findings via a Reflexion loop.

Workers (The Muscle): Sandboxed compute nodes that run experiments in isolated tmux sessions, submitting statistical reports back to the system.

Ingest Pipeline: A non-agentic system using MinerU and Bedrock to swallow PDFs and MD files, turning them into summaries the agents can actually use.

How we built it

FastAPI & SQLite: The backbone consists of two separate databases—Project DB for research state and Direction DB for the Orchestrator's internal reasoning.

AWS Bedrock (Claude Sonnet): Powers the high-level reasoning and statistical interpretation.

Process Isolation: Workers are decoupled using tmux sessions, allowing the system to scale and manage compute load without blocking the main loop.

MinerU: Handles complex document extraction, including tables and formulas, through a multi-mode (flash, precision, or local) pipeline.

Challenges we ran into

Zombie Workers: When a user pivots a project mid-execution, we had to ensure background workers didn't keep wasting compute on irrelevant tasks. We solved this with a "Hard Terminate" tool that kills tmux sessions instantly.

Statelessness: Because the Orchestrator’s context is wiped every 30 seconds, we had to build a system that perfectly reconstructs its "memory" from the Trajectory and Experience tables in the Direction DB on every cycle.

Race Conditions: Managing a live state tree shared by both users and autonomous agents required strict pessimistic locking and clear role division to prevent "thrashing".

Accomplishments that we're proud of

The Experience System: Our agents don't just work; they learn. They append lessons and heuristics (e.g., "Polars is 3x faster than pandas for this dataset") to an append-only list that guides future reasoning.

Parallel Dispatch: The Orchestrator can fire off multiple non-blocking worker requests simultaneously, significantly speeding up the research lifecycle.

Clean Separation: Agents never perform literature review; the ingest pipeline handles the "reading" so the agents can focus purely on statistical execution.

What we learned

Chat is a Bottleneck: For complex research, manipulating a database directly is far more efficient than trying to steer an agent through a conversational thread.

Persistence is Key: By making the agent stateless and the database the "source of truth," the user can manually overwrite the agent's internal reasoning at any time to fix a bad logic loop.

What's next for Research Cockpit Endless Autonomy: Fully enabling the is_auto toggle to allow the system to generate and test its own automated hypotheses indefinitely.

Advanced Statistical Deep-Dives: Expanding the Worker Report Contract to include more complex confidence intervals and automated visualization generation.

Local GPU Scaling: Optimizing the local MinerU extraction mode to handle massive document libraries without relying on external APIs.

Share this project:

Updates