MATIH: Level-3 Hybrid Terminal Coding Agent

A production-ready autonomous code agent that reads entire codebases, generates structured plans, and executes code modifications with full user control and safety guarantees.

Features

🏗️ Core Architecture

Strict Separation of Concerns: Planner (reasoning) and Executor (implementation) are completely separate
Local-Only RAG: Self-contained semantic search using Jina embeddings + FAISS
Safe Sandbox: Filesystem boundary enforcement, atomic writes, automatic backups
User-Controlled: Full confirmation workflows before any modification
Deterministic & Auditable: All operations logged, all plans validated against JSON schemas

🧠 Components

Planner LLM (o3-mini-high)

Analyzes user requests + codebase context from RAG
Generates structured JSON plans (never executes code)
Validates against planner_schema.json
Outputs: multi-step modification plans with dependencies

Executor LLM (grok-code-fast)

Implements specific plan steps with actual code changes
Produces only diffs/patches in unified format
Validates against executor_schema.json
Outputs: file patches, file creations, command results

RAG Layer

Chunks Python, JavaScript, JSON, YAML, Markdown files (200-400 tokens)
Embeds with Jina embeddings API
Indexes with FAISS for O(log n) retrieval
Supports incremental index updates

Sandbox

Enforces workspace boundaries
Atomic file writes (temp + rename)
Automatic backups before modifications
Command whitelisting
Policy-driven security

Terminal UI

Plan review before execution
Diff previews with stats
Inline edit prompts
Execution progress tracking
Color-coded output

Installation

Requirements

Python 3.10+
FAISS (CPU or GPU): pip install faiss-cpu
OpenAI API key for planner
Grok API key for executor
Jina API key for embeddings

Setup

# Clone repository
git clone https://github.com/your-repo/matih.git
cd matih

# Install dependencies
pip install -r requirements.txt

# Set API keys
export OPENAI_API_KEY="sk-..."
export GROK_API_KEY="xai-..."
export JINA_API_KEY="jina_..."

# Build RAG index from workspace
python scripts/build_index.py /path/to/your/workspace

# Run agent
python matih.py --workspace /path/to/your/workspace

Configuration

System Configuration (`configs/system.yaml`)

agent:
  name: "MATIH"
  level: 3

models:
  planner:
    name: "o3-mini-high"
    temperature: 0.2
  executor:
    name: "grok-code-fast"
    temperature: 0.3

rag:
  enabled: true
  chunk_size_tokens: 300
  top_k: 5

sandbox:
  enabled: true
  enforce_workspace_boundary: true
  atomic_writes: true
  backup_before_modify: true

RAG Configuration (`configs/rag_config.yaml`)

embedding:
  model_name: "jina-embeddings-v3"

chunking:
  target_chunk_size_tokens: 300
  overlap_tokens: 50
  split_by_function: true

indexing:
  backend: "faiss"
  metric: "cosine"

Sandbox Policy (`configs/sandbox_whitelist.yaml`)

commands:
  allowed:
    - cmd: "python"
      pattern: "-c|script\\.py"
    - cmd: "git"
      pattern: "status|log|diff"
  forbidden:
    - "rm -rf"
    - "sudo"
    - "chmod 777"

Usage

Interactive Mode

python matih.py --workspace /path/to/workspace

Then enter requests:

> Add error handling to the authentication module
> Refactor database connection pooling
> Fix the NoneType error in user.py line 45

Single Request Mode

python matih.py --request "Add caching decorator to utils module"

Build/Upgrade RAG Index

# Build from scratch
python scripts/build_index.py /path/to/workspace

# Incrementally upgrade
python scripts/upgrade_index.py /path/to/workspace

Workflow

┌─────────────────────────────────────┐
│   User Request                      │
│   "Add error handling to auth"      │
└─────────────────┬───────────────────┘
                  │
                  ▼
        ┌─────────────────┐
        │  RAG Retrieval  │  ◄─── Fetches relevant code context
        └────────┬────────┘       using semantic search
                 │
                 ▼
        ┌──────────────────────┐
        │  Planner LLM         │  ◄─── Generates structured plan
        │  (o3-mini-high)      │       (JSON, no execution)
        └────────┬─────────────┘
                 │
                 ▼
        ┌──────────────────────┐
        │  Plan Validation     │  ◄─── Validates against
        │  JSON Schema Check   │       planner_schema.json
        └────────┬─────────────┘
                 │
                 ▼
        ┌──────────────────────┐
        │  Show Plan to User   │  ◄─── Confidence, effort,
        │  Request Confirmation│       risks, detailed steps
        └────────┬─────────────┘
                 │
            [User Confirms]
                 │
                 ▼
        ┌──────────────────────┐
        │  For Each Step:      │
        │                      │
        │  ┌────────────────┐  │
        │  │ Executor LLM   │  │  ◄─── Generate patches/diffs
        │  │(grok-code-fast)│  │
        │  └────────────────┘  │
        │         │            │
        │         ▼            │
        │  ┌────────────────┐  │
        │  │ Validate Patch │  │
        │  │ JSON Schema    │  │
        │  └────────────────┘  │
        │         │            │
        │         ▼            │
        │  ┌────────────────┐  │
        │  │ Show Diff      │  │
        │  │ Request OK     │  │  ◄─── Preview with +/- lines
        │  └────────────────┘  │
        │         │            │
        │    [User OK's]        │
        │         │            │
        │         ▼            │
        │  ┌────────────────┐  │
        │  │ Sandbox Apply  │  │  ◄─── Safe application:
        │  │ Backup→Apply   │  │       • Boundary check
        │  │ Atomic Rename  │  │       • Backup before
        │  └────────────────┘  │       • Atomic rename
        └────────┬─────────────┘
                 │
                 ▼
        ┌──────────────────────┐
        │  Execution Summary   │
        │  Success/Failure     │
        │  Backup Locations    │
        └──────────────────────┘

JSON Schemas

Planner Output (`configs/planner_schema.json`)

{
  "plan_id": "uuid",
  "analysis": "reasoning about the approach",
  "steps": [
    {
      "step_id": 1,
      "action": "create_file|modify_file|delete_file|run_command",
      "target_path": "path/to/file.py",
      "description": "What this step does",
      "dependencies": [0]
    }
  ],
  "confidence": 0.85,
  "estimated_effort": "small|medium|large",
  "risks": ["potential issue"]
}

Executor Output (`configs/executor_schema.json`)

{
  "execution_id": "uuid",
  "step_id": 1,
  "status": "success|failed",
  "result": {
    "action": "file_patch",
    "target_path": "path/to/file.py",
    "diff": "unified diff content",
    "original_hash": "sha256",
    "new_hash": "sha256"
  }
}

Supported File Types

Python (.py): Split at function/class boundaries
JavaScript (.js, .ts): Token-based chunking
JSON (.json): Object-based chunking
YAML (.yaml, .yml): Key-based chunking
Markdown (.md): Heading-based chunking

Safety Features

Boundary Enforcement

All file operations restricted to workspace directory
Protected paths: /etc, /sys, /proc, Windows system directories

Atomic Writes

Write to temp file
Verify content
Atomic rename (no partial writes)

Automatic Backups

Backup before any modification
Kept for 7 days (configurable)
Location: .matih_backups/

Command Whitelisting

Allowed:
  • python (with patterns)
  • node
  • git (read-only)
  • cat, ls, find, grep

Forbidden:
  • rm -rf
  • sudo, su
  • chmod 777
  • mkfs, dd, format

User Confirmation

Plan review before execution
Diff preview before application
Optional inline editing
Confirmation after each step if errors occur

API Stubs

The following LLM calls are stubs that developers must implement:

Planner (stub in `core/planner.py`)

def _call_planner_llm(self, user_request: str, context: str):
    # TODO: Replace with actual OpenAI API call
    # Expected: https://api.openai.com/v1/chat/completions
    # Model: "o3-mini-high"
    # Return: JSON plan matching planner_schema.json
    pass

Executor (stub in `core/executor.py`)

def _call_executor_llm(self, step: PlanStep, context: str):
    # TODO: Replace with actual Grok API call
    # Expected: https://api.xai.com/v1/chat/completions
    # Model: "grok-2-code-fast"
    # Return: JSON result matching executor_schema.json
    pass

Embeddings (stub in `rag/embedder.py`)

def embed_text(self, text: str):
    # TODO: Replace with actual Jina API call
    # Expected: https://api.jina.ai/v1/embeddings
    # Model: "jina-embeddings-v3"
    # Return: numpy array [1024] (default dimension)
    pass

Testing

Run All Tests

python -m pytest tests/ -v

Run Unit Tests

python -m pytest tests/unit/ -v

Run Integration Tests

python -m pytest tests/integration/ -v

Run Specific Test

python -m pytest tests/unit/test_chunker.py -v
python -m pytest tests/unit/test_sandbox_policy.py -v

Project Structure

matih/
  configs/
    system.yaml                 # Main configuration
    planner_schema.json         # Planner output validation
    executor_schema.json        # Executor output validation
    rag_config.yaml            # RAG settings
    sandbox_whitelist.yaml     # Security policy

  core/
    agent.py                   # Main agent orchestrator
    planner.py                 # Plan generation
    executor.py                # Plan execution
    controller.py              # Execution loop control
    logging_config.py          # Logging setup
    ui/
      tui.py                   # Terminal UI
      diff_renderer.py         # Diff display

  rag/
    embedder.py                # Jina embeddings (stub)
    chunker.py                 # File chunking
    indexer.py                 # FAISS indexing
    retriever.py               # Semantic search
    store/                      # Index storage

  sandbox/
    sandbox_fs.py              # Safe filesystem
    command_runner.py          # Subprocess execution
    policy.py                  # Security policy
    sandbox_tests.py           # Sandbox unit tests

  utils/
    types.py                   # Shared data types
    file_utils.py              # File operations
    diff_utils.py              # Diff utilities
    json_schema.py             # Schema validation
    token_counter.py           # Token estimation

  prompts/
    planner_prompts.md         # Planner system prompts
    executor_prompts.md        # Executor system prompts
    user_messages.md           # UI messages

  scripts/
    build_index.py             # Index builder
    upgrade_index.py           # Incremental indexing
    run_agent.sh               # Launch script

  tests/
    unit/                      # Unit tests
    integration/               # Integration tests

  matih.py                      # Main entry point

Architecture Decisions

Why Separate Planner and Executor?

Planner: Optimized for reasoning and planning (o3-mini-high, lower temperature)
Executor: Optimized for code generation (grok-2-code-fast, fast inference)
Safety: Prevents "execution hallucination" - planner can't accidentally execute code

Why Local RAG?

Privacy: Entire codebase stays on local machine
Cost: No API calls for retrieval, minimal embedding costs
Speed: Millisecond-level retrieval vs. API latency
Control: Can customize chunking and indexing strategy

Why FAISS?

Scalability: Handles millions of chunks
Speed: O(log n) retrieval with IVF
Simplicity: No separate service needed
Flexibility: CPU or GPU backend

Why Atomic Writes?

Consistency: No partial file modifications on disk
Recovery: If write fails, original file unchanged
Auditability: Clear before/after states

Why JSON Schemas?

Validation: Catch LLM errors early
Type Safety: Structured, validated outputs
Documentation: Schema serves as API contract
Tooling: Standard JSON schema validators available

Limitations & Future Work

Current Limitations

Planner and Executor LLM calls are stubs (developers must implement)
Patch application is simplified (use patch command for robustness)
No multi-workspace support
No async execution
Single-threaded only

Future Enhancements

[ ] Implement actual LLM API calls (OpenAI, Grok, Jina)
[ ] Robust patch application using patch command
[ ] Concurrent step execution with dependency resolution
[ ] Web UI dashboard
[ ] Multi-repo support
[ ] Custom prompt templates
[ ] Telemetry and analytics
[ ] Performance profiling
[ ] Advanced diff strategies (smarter context)

Contributing

Contributions welcome! Please:

Follow existing code style
Add tests for new features
Update documentation
Validate against schemas

License

MIT License - See LICENSE file

Support

Issues: GitHub Issues
Documentation: See docs/ directory
Architecture: See docs/architecture.md
Security: See docs/security.md
Testing: See docs/testing_strategy.md

MATIH: Making code modification reliable, auditable, and safe.

Built With

Updates

MOHITH CHANDRA started this project — Feb 25, 2026 03:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.