MATIH: Level-3 Hybrid Terminal Coding Agent
A production-ready autonomous code agent that reads entire codebases, generates structured plans, and executes code modifications with full user control and safety guarantees.
Features
ποΈ Core Architecture
- Strict Separation of Concerns: Planner (reasoning) and Executor (implementation) are completely separate
- Local-Only RAG: Self-contained semantic search using Jina embeddings + FAISS
- Safe Sandbox: Filesystem boundary enforcement, atomic writes, automatic backups
- User-Controlled: Full confirmation workflows before any modification
- Deterministic & Auditable: All operations logged, all plans validated against JSON schemas
π§ Components
Planner LLM (o3-mini-high)
- Analyzes user requests + codebase context from RAG
- Generates structured JSON plans (never executes code)
- Validates against
planner_schema.json - Outputs: multi-step modification plans with dependencies
Executor LLM (grok-code-fast)
- Implements specific plan steps with actual code changes
- Produces only diffs/patches in unified format
- Validates against
executor_schema.json - Outputs: file patches, file creations, command results
RAG Layer
- Chunks Python, JavaScript, JSON, YAML, Markdown files (200-400 tokens)
- Embeds with Jina embeddings API
- Indexes with FAISS for O(log n) retrieval
- Supports incremental index updates
Sandbox
- Enforces workspace boundaries
- Atomic file writes (temp + rename)
- Automatic backups before modifications
- Command whitelisting
- Policy-driven security
Terminal UI
- Plan review before execution
- Diff previews with stats
- Inline edit prompts
- Execution progress tracking
- Color-coded output
Installation
Requirements
- Python 3.10+
- FAISS (CPU or GPU):
pip install faiss-cpu - OpenAI API key for planner
- Grok API key for executor
- Jina API key for embeddings
Setup
# Clone repository
git clone https://github.com/your-repo/matih.git
cd matih
# Install dependencies
pip install -r requirements.txt
# Set API keys
export OPENAI_API_KEY="sk-..."
export GROK_API_KEY="xai-..."
export JINA_API_KEY="jina_..."
# Build RAG index from workspace
python scripts/build_index.py /path/to/your/workspace
# Run agent
python matih.py --workspace /path/to/your/workspace
Configuration
System Configuration (configs/system.yaml)
agent:
name: "MATIH"
level: 3
models:
planner:
name: "o3-mini-high"
temperature: 0.2
executor:
name: "grok-code-fast"
temperature: 0.3
rag:
enabled: true
chunk_size_tokens: 300
top_k: 5
sandbox:
enabled: true
enforce_workspace_boundary: true
atomic_writes: true
backup_before_modify: true
RAG Configuration (configs/rag_config.yaml)
embedding:
model_name: "jina-embeddings-v3"
chunking:
target_chunk_size_tokens: 300
overlap_tokens: 50
split_by_function: true
indexing:
backend: "faiss"
metric: "cosine"
Sandbox Policy (configs/sandbox_whitelist.yaml)
commands:
allowed:
- cmd: "python"
pattern: "-c|script\\.py"
- cmd: "git"
pattern: "status|log|diff"
forbidden:
- "rm -rf"
- "sudo"
- "chmod 777"
Usage
Interactive Mode
python matih.py --workspace /path/to/workspace
Then enter requests:
> Add error handling to the authentication module
> Refactor database connection pooling
> Fix the NoneType error in user.py line 45
Single Request Mode
python matih.py --request "Add caching decorator to utils module"
Build/Upgrade RAG Index
# Build from scratch
python scripts/build_index.py /path/to/workspace
# Incrementally upgrade
python scripts/upgrade_index.py /path/to/workspace
Workflow
βββββββββββββββββββββββββββββββββββββββ
β User Request β
β "Add error handling to auth" β
βββββββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β RAG Retrieval β ββββ Fetches relevant code context
ββββββββββ¬βββββββββ using semantic search
β
βΌ
ββββββββββββββββββββββββ
β Planner LLM β ββββ Generates structured plan
β (o3-mini-high) β (JSON, no execution)
ββββββββββ¬ββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Plan Validation β ββββ Validates against
β JSON Schema Check β planner_schema.json
ββββββββββ¬ββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Show Plan to User β ββββ Confidence, effort,
β Request Confirmationβ risks, detailed steps
ββββββββββ¬ββββββββββββββ
β
[User Confirms]
β
βΌ
ββββββββββββββββββββββββ
β For Each Step: β
β β
β ββββββββββββββββββ β
β β Executor LLM β β ββββ Generate patches/diffs
β β(grok-code-fast)β β
β ββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββ β
β β Validate Patch β β
β β JSON Schema β β
β ββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββ β
β β Show Diff β β
β β Request OK β β ββββ Preview with +/- lines
β ββββββββββββββββββ β
β β β
β [User OK's] β
β β β
β βΌ β
β ββββββββββββββββββ β
β β Sandbox Apply β β ββββ Safe application:
β β BackupβApply β β β’ Boundary check
β β Atomic Rename β β β’ Backup before
β ββββββββββββββββββ β β’ Atomic rename
ββββββββββ¬ββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Execution Summary β
β Success/Failure β
β Backup Locations β
ββββββββββββββββββββββββ
JSON Schemas
Planner Output (configs/planner_schema.json)
{
"plan_id": "uuid",
"analysis": "reasoning about the approach",
"steps": [
{
"step_id": 1,
"action": "create_file|modify_file|delete_file|run_command",
"target_path": "path/to/file.py",
"description": "What this step does",
"dependencies": [0]
}
],
"confidence": 0.85,
"estimated_effort": "small|medium|large",
"risks": ["potential issue"]
}
Executor Output (configs/executor_schema.json)
{
"execution_id": "uuid",
"step_id": 1,
"status": "success|failed",
"result": {
"action": "file_patch",
"target_path": "path/to/file.py",
"diff": "unified diff content",
"original_hash": "sha256",
"new_hash": "sha256"
}
}
Supported File Types
- Python (
.py): Split at function/class boundaries - JavaScript (
.js,.ts): Token-based chunking - JSON (
.json): Object-based chunking - YAML (
.yaml,.yml): Key-based chunking - Markdown (
.md): Heading-based chunking
Safety Features
Boundary Enforcement
- All file operations restricted to workspace directory
- Protected paths:
/etc,/sys,/proc, Windows system directories
Atomic Writes
- Write to temp file
- Verify content
- Atomic rename (no partial writes)
Automatic Backups
- Backup before any modification
- Kept for 7 days (configurable)
- Location:
.matih_backups/
Command Whitelisting
Allowed:
β’ python (with patterns)
β’ node
β’ git (read-only)
β’ cat, ls, find, grep
Forbidden:
β’ rm -rf
β’ sudo, su
β’ chmod 777
β’ mkfs, dd, format
User Confirmation
- Plan review before execution
- Diff preview before application
- Optional inline editing
- Confirmation after each step if errors occur
API Stubs
The following LLM calls are stubs that developers must implement:
Planner (stub in core/planner.py)
def _call_planner_llm(self, user_request: str, context: str):
# TODO: Replace with actual OpenAI API call
# Expected: https://api.openai.com/v1/chat/completions
# Model: "o3-mini-high"
# Return: JSON plan matching planner_schema.json
pass
Executor (stub in core/executor.py)
def _call_executor_llm(self, step: PlanStep, context: str):
# TODO: Replace with actual Grok API call
# Expected: https://api.xai.com/v1/chat/completions
# Model: "grok-2-code-fast"
# Return: JSON result matching executor_schema.json
pass
Embeddings (stub in rag/embedder.py)
def embed_text(self, text: str):
# TODO: Replace with actual Jina API call
# Expected: https://api.jina.ai/v1/embeddings
# Model: "jina-embeddings-v3"
# Return: numpy array [1024] (default dimension)
pass
Testing
Run All Tests
python -m pytest tests/ -v
Run Unit Tests
python -m pytest tests/unit/ -v
Run Integration Tests
python -m pytest tests/integration/ -v
Run Specific Test
python -m pytest tests/unit/test_chunker.py -v
python -m pytest tests/unit/test_sandbox_policy.py -v
Project Structure
matih/
configs/
system.yaml # Main configuration
planner_schema.json # Planner output validation
executor_schema.json # Executor output validation
rag_config.yaml # RAG settings
sandbox_whitelist.yaml # Security policy
core/
agent.py # Main agent orchestrator
planner.py # Plan generation
executor.py # Plan execution
controller.py # Execution loop control
logging_config.py # Logging setup
ui/
tui.py # Terminal UI
diff_renderer.py # Diff display
rag/
embedder.py # Jina embeddings (stub)
chunker.py # File chunking
indexer.py # FAISS indexing
retriever.py # Semantic search
store/ # Index storage
sandbox/
sandbox_fs.py # Safe filesystem
command_runner.py # Subprocess execution
policy.py # Security policy
sandbox_tests.py # Sandbox unit tests
utils/
types.py # Shared data types
file_utils.py # File operations
diff_utils.py # Diff utilities
json_schema.py # Schema validation
token_counter.py # Token estimation
prompts/
planner_prompts.md # Planner system prompts
executor_prompts.md # Executor system prompts
user_messages.md # UI messages
scripts/
build_index.py # Index builder
upgrade_index.py # Incremental indexing
run_agent.sh # Launch script
tests/
unit/ # Unit tests
integration/ # Integration tests
matih.py # Main entry point
Architecture Decisions
Why Separate Planner and Executor?
- Planner: Optimized for reasoning and planning (o3-mini-high, lower temperature)
- Executor: Optimized for code generation (grok-2-code-fast, fast inference)
- Safety: Prevents "execution hallucination" - planner can't accidentally execute code
Why Local RAG?
- Privacy: Entire codebase stays on local machine
- Cost: No API calls for retrieval, minimal embedding costs
- Speed: Millisecond-level retrieval vs. API latency
- Control: Can customize chunking and indexing strategy
Why FAISS?
- Scalability: Handles millions of chunks
- Speed: O(log n) retrieval with IVF
- Simplicity: No separate service needed
- Flexibility: CPU or GPU backend
Why Atomic Writes?
- Consistency: No partial file modifications on disk
- Recovery: If write fails, original file unchanged
- Auditability: Clear before/after states
Why JSON Schemas?
- Validation: Catch LLM errors early
- Type Safety: Structured, validated outputs
- Documentation: Schema serves as API contract
- Tooling: Standard JSON schema validators available
Limitations & Future Work
Current Limitations
- Planner and Executor LLM calls are stubs (developers must implement)
- Patch application is simplified (use
patchcommand for robustness) - No multi-workspace support
- No async execution
- Single-threaded only
Future Enhancements
- [ ] Implement actual LLM API calls (OpenAI, Grok, Jina)
- [ ] Robust patch application using
patchcommand - [ ] Concurrent step execution with dependency resolution
- [ ] Web UI dashboard
- [ ] Multi-repo support
- [ ] Custom prompt templates
- [ ] Telemetry and analytics
- [ ] Performance profiling
- [ ] Advanced diff strategies (smarter context)
Contributing
Contributions welcome! Please:
- Follow existing code style
- Add tests for new features
- Update documentation
- Validate against schemas
License
MIT License - See LICENSE file
Support
- Issues: GitHub Issues
- Documentation: See
docs/directory - Architecture: See
docs/architecture.md - Security: See
docs/security.md - Testing: See
docs/testing_strategy.md
MATIH: Making code modification reliable, auditable, and safe.
Log in or sign up for Devpost to join the conversation.