Inspiration

In hospitals handling patient records, military operations with classified data, and financial institutions managing sensitive transactions, AI assistance is desperately needed but cloud-based solutions pose unacceptable security risks. I wanted to build a terminal agent that runs is able to run on edge-servers or on-device using small language models (SLMs) that we have full control over, eliminating data exfiltration while providing intelligent automation for critical, privacy-sensitive environments where every keystroke matters. Not to mention consumer markets that prioritize time-to-first-token and latency, where SLMs have an inherent advantage over cloud-based LLMs.

What it does

TerminalAnts is a privacy-first terminal agent built on the Terminus framework from TerminalBench that handles file operations, code assistance, and system automation without any secondary model providers. It provides intelligent command suggestions, automates complex workflows, and offers contextual help for sensitive environments—hospitals managing HIPAA-compliant data, military networks with classified information, and any organization where data sovereignty is non-negotiable.

How I built it

I used the Terminus agent architecture from TerminalBench as the foundation, implementing optimized SLMs with efficient inference engines that maximize performance on local hardware. I deployed a SGLang server for each SLM I planned to integrate into the system on Modal for scalable development and deployment.

Challenges I ran into

Balancing model capability with strict resource constraints, where I needed to find SLMs that could match LLM performance on common terminal tasks while fitting on a single A100 GPU. Multi-round reasoning, integrating search results, model response structuring, extracting keystrokes and bash commands.

Accomplishments that I'm proud of

I was able to build a system where SLMs can provide end-to-end terminal automation without compromising security, achieving competitive performance against the SOTA while maintaining complete data isolation.

What's next for TerminalAnts

Complete evaluation on TerminalBench to benchmark against SOTA solutions like Warp using fully local models, proving that privacy-preserving agents can achieve competitive performance. I would like to explore improvements in search capabilities, enhanced problem decomposition, task-specific agent architectures, and targeted model fine-tuning to establish new benchmarks. Ultimately, I want to beach TerminalBench with SLMs fully running on edge/locally goal.

Built With

Share this project:

Updates