MATIH: Level-3 Hybrid Terminal Coding Agent

A production-ready autonomous code agent that reads entire codebases, generates structured plans, and executes code modifications with full user control and safety guarantees.

Features

πŸ—οΈ Core Architecture

  • Strict Separation of Concerns: Planner (reasoning) and Executor (implementation) are completely separate
  • Local-Only RAG: Self-contained semantic search using Jina embeddings + FAISS
  • Safe Sandbox: Filesystem boundary enforcement, atomic writes, automatic backups
  • User-Controlled: Full confirmation workflows before any modification
  • Deterministic & Auditable: All operations logged, all plans validated against JSON schemas

🧠 Components

Planner LLM (o3-mini-high)

  • Analyzes user requests + codebase context from RAG
  • Generates structured JSON plans (never executes code)
  • Validates against planner_schema.json
  • Outputs: multi-step modification plans with dependencies

Executor LLM (grok-code-fast)

  • Implements specific plan steps with actual code changes
  • Produces only diffs/patches in unified format
  • Validates against executor_schema.json
  • Outputs: file patches, file creations, command results

RAG Layer

  • Chunks Python, JavaScript, JSON, YAML, Markdown files (200-400 tokens)
  • Embeds with Jina embeddings API
  • Indexes with FAISS for O(log n) retrieval
  • Supports incremental index updates

Sandbox

  • Enforces workspace boundaries
  • Atomic file writes (temp + rename)
  • Automatic backups before modifications
  • Command whitelisting
  • Policy-driven security

Terminal UI

  • Plan review before execution
  • Diff previews with stats
  • Inline edit prompts
  • Execution progress tracking
  • Color-coded output

Installation

Requirements

  • Python 3.10+
  • FAISS (CPU or GPU): pip install faiss-cpu
  • OpenAI API key for planner
  • Grok API key for executor
  • Jina API key for embeddings

Setup

# Clone repository
git clone https://github.com/your-repo/matih.git
cd matih

# Install dependencies
pip install -r requirements.txt

# Set API keys
export OPENAI_API_KEY="sk-..."
export GROK_API_KEY="xai-..."
export JINA_API_KEY="jina_..."

# Build RAG index from workspace
python scripts/build_index.py /path/to/your/workspace

# Run agent
python matih.py --workspace /path/to/your/workspace

Configuration

System Configuration (configs/system.yaml)

agent:
  name: "MATIH"
  level: 3

models:
  planner:
    name: "o3-mini-high"
    temperature: 0.2
  executor:
    name: "grok-code-fast"
    temperature: 0.3

rag:
  enabled: true
  chunk_size_tokens: 300
  top_k: 5

sandbox:
  enabled: true
  enforce_workspace_boundary: true
  atomic_writes: true
  backup_before_modify: true

RAG Configuration (configs/rag_config.yaml)

embedding:
  model_name: "jina-embeddings-v3"

chunking:
  target_chunk_size_tokens: 300
  overlap_tokens: 50
  split_by_function: true

indexing:
  backend: "faiss"
  metric: "cosine"

Sandbox Policy (configs/sandbox_whitelist.yaml)

commands:
  allowed:
    - cmd: "python"
      pattern: "-c|script\\.py"
    - cmd: "git"
      pattern: "status|log|diff"
  forbidden:
    - "rm -rf"
    - "sudo"
    - "chmod 777"

Usage

Interactive Mode

python matih.py --workspace /path/to/workspace

Then enter requests:

> Add error handling to the authentication module
> Refactor database connection pooling
> Fix the NoneType error in user.py line 45

Single Request Mode

python matih.py --request "Add caching decorator to utils module"

Build/Upgrade RAG Index

# Build from scratch
python scripts/build_index.py /path/to/workspace

# Incrementally upgrade
python scripts/upgrade_index.py /path/to/workspace

Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Request                      β”‚
β”‚   "Add error handling to auth"      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  RAG Retrieval  β”‚  ◄─── Fetches relevant code context
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜       using semantic search
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Planner LLM         β”‚  ◄─── Generates structured plan
        β”‚  (o3-mini-high)      β”‚       (JSON, no execution)
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Plan Validation     β”‚  ◄─── Validates against
        β”‚  JSON Schema Check   β”‚       planner_schema.json
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Show Plan to User   β”‚  ◄─── Confidence, effort,
        β”‚  Request Confirmationβ”‚       risks, detailed steps
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
            [User Confirms]
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  For Each Step:      β”‚
        β”‚                      β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
        β”‚  β”‚ Executor LLM   β”‚  β”‚  ◄─── Generate patches/diffs
        β”‚  β”‚(grok-code-fast)β”‚  β”‚
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
        β”‚         β”‚            β”‚
        β”‚         β–Ό            β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
        β”‚  β”‚ Validate Patch β”‚  β”‚
        β”‚  β”‚ JSON Schema    β”‚  β”‚
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
        β”‚         β”‚            β”‚
        β”‚         β–Ό            β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
        β”‚  β”‚ Show Diff      β”‚  β”‚
        β”‚  β”‚ Request OK     β”‚  β”‚  ◄─── Preview with +/- lines
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
        β”‚         β”‚            β”‚
        β”‚    [User OK's]        β”‚
        β”‚         β”‚            β”‚
        β”‚         β–Ό            β”‚
        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
        β”‚  β”‚ Sandbox Apply  β”‚  β”‚  ◄─── Safe application:
        │  │ Backup→Apply   │  │       ‒ Boundary check
        β”‚  β”‚ Atomic Rename  β”‚  β”‚       β€’ Backup before
        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚       β€’ Atomic rename
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Execution Summary   β”‚
        β”‚  Success/Failure     β”‚
        β”‚  Backup Locations    β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

JSON Schemas

Planner Output (configs/planner_schema.json)

{
  "plan_id": "uuid",
  "analysis": "reasoning about the approach",
  "steps": [
    {
      "step_id": 1,
      "action": "create_file|modify_file|delete_file|run_command",
      "target_path": "path/to/file.py",
      "description": "What this step does",
      "dependencies": [0]
    }
  ],
  "confidence": 0.85,
  "estimated_effort": "small|medium|large",
  "risks": ["potential issue"]
}

Executor Output (configs/executor_schema.json)

{
  "execution_id": "uuid",
  "step_id": 1,
  "status": "success|failed",
  "result": {
    "action": "file_patch",
    "target_path": "path/to/file.py",
    "diff": "unified diff content",
    "original_hash": "sha256",
    "new_hash": "sha256"
  }
}

Supported File Types

  • Python (.py): Split at function/class boundaries
  • JavaScript (.js, .ts): Token-based chunking
  • JSON (.json): Object-based chunking
  • YAML (.yaml, .yml): Key-based chunking
  • Markdown (.md): Heading-based chunking

Safety Features

Boundary Enforcement

  • All file operations restricted to workspace directory
  • Protected paths: /etc, /sys, /proc, Windows system directories

Atomic Writes

  • Write to temp file
  • Verify content
  • Atomic rename (no partial writes)

Automatic Backups

  • Backup before any modification
  • Kept for 7 days (configurable)
  • Location: .matih_backups/

Command Whitelisting

Allowed:
  β€’ python (with patterns)
  β€’ node
  β€’ git (read-only)
  β€’ cat, ls, find, grep

Forbidden:
  β€’ rm -rf
  β€’ sudo, su
  β€’ chmod 777
  β€’ mkfs, dd, format

User Confirmation

  • Plan review before execution
  • Diff preview before application
  • Optional inline editing
  • Confirmation after each step if errors occur

API Stubs

The following LLM calls are stubs that developers must implement:

Planner (stub in core/planner.py)

def _call_planner_llm(self, user_request: str, context: str):
    # TODO: Replace with actual OpenAI API call
    # Expected: https://api.openai.com/v1/chat/completions
    # Model: "o3-mini-high"
    # Return: JSON plan matching planner_schema.json
    pass

Executor (stub in core/executor.py)

def _call_executor_llm(self, step: PlanStep, context: str):
    # TODO: Replace with actual Grok API call
    # Expected: https://api.xai.com/v1/chat/completions
    # Model: "grok-2-code-fast"
    # Return: JSON result matching executor_schema.json
    pass

Embeddings (stub in rag/embedder.py)

def embed_text(self, text: str):
    # TODO: Replace with actual Jina API call
    # Expected: https://api.jina.ai/v1/embeddings
    # Model: "jina-embeddings-v3"
    # Return: numpy array [1024] (default dimension)
    pass

Testing

Run All Tests

python -m pytest tests/ -v

Run Unit Tests

python -m pytest tests/unit/ -v

Run Integration Tests

python -m pytest tests/integration/ -v

Run Specific Test

python -m pytest tests/unit/test_chunker.py -v
python -m pytest tests/unit/test_sandbox_policy.py -v

Project Structure

matih/
  configs/
    system.yaml                 # Main configuration
    planner_schema.json         # Planner output validation
    executor_schema.json        # Executor output validation
    rag_config.yaml            # RAG settings
    sandbox_whitelist.yaml     # Security policy

  core/
    agent.py                   # Main agent orchestrator
    planner.py                 # Plan generation
    executor.py                # Plan execution
    controller.py              # Execution loop control
    logging_config.py          # Logging setup
    ui/
      tui.py                   # Terminal UI
      diff_renderer.py         # Diff display

  rag/
    embedder.py                # Jina embeddings (stub)
    chunker.py                 # File chunking
    indexer.py                 # FAISS indexing
    retriever.py               # Semantic search
    store/                      # Index storage

  sandbox/
    sandbox_fs.py              # Safe filesystem
    command_runner.py          # Subprocess execution
    policy.py                  # Security policy
    sandbox_tests.py           # Sandbox unit tests

  utils/
    types.py                   # Shared data types
    file_utils.py              # File operations
    diff_utils.py              # Diff utilities
    json_schema.py             # Schema validation
    token_counter.py           # Token estimation

  prompts/
    planner_prompts.md         # Planner system prompts
    executor_prompts.md        # Executor system prompts
    user_messages.md           # UI messages

  scripts/
    build_index.py             # Index builder
    upgrade_index.py           # Incremental indexing
    run_agent.sh               # Launch script

  tests/
    unit/                      # Unit tests
    integration/               # Integration tests

  matih.py                      # Main entry point

Architecture Decisions

Why Separate Planner and Executor?

  • Planner: Optimized for reasoning and planning (o3-mini-high, lower temperature)
  • Executor: Optimized for code generation (grok-2-code-fast, fast inference)
  • Safety: Prevents "execution hallucination" - planner can't accidentally execute code

Why Local RAG?

  • Privacy: Entire codebase stays on local machine
  • Cost: No API calls for retrieval, minimal embedding costs
  • Speed: Millisecond-level retrieval vs. API latency
  • Control: Can customize chunking and indexing strategy

Why FAISS?

  • Scalability: Handles millions of chunks
  • Speed: O(log n) retrieval with IVF
  • Simplicity: No separate service needed
  • Flexibility: CPU or GPU backend

Why Atomic Writes?

  • Consistency: No partial file modifications on disk
  • Recovery: If write fails, original file unchanged
  • Auditability: Clear before/after states

Why JSON Schemas?

  • Validation: Catch LLM errors early
  • Type Safety: Structured, validated outputs
  • Documentation: Schema serves as API contract
  • Tooling: Standard JSON schema validators available

Limitations & Future Work

Current Limitations

  • Planner and Executor LLM calls are stubs (developers must implement)
  • Patch application is simplified (use patch command for robustness)
  • No multi-workspace support
  • No async execution
  • Single-threaded only

Future Enhancements

  • [ ] Implement actual LLM API calls (OpenAI, Grok, Jina)
  • [ ] Robust patch application using patch command
  • [ ] Concurrent step execution with dependency resolution
  • [ ] Web UI dashboard
  • [ ] Multi-repo support
  • [ ] Custom prompt templates
  • [ ] Telemetry and analytics
  • [ ] Performance profiling
  • [ ] Advanced diff strategies (smarter context)

Contributing

Contributions welcome! Please:

  1. Follow existing code style
  2. Add tests for new features
  3. Update documentation
  4. Validate against schemas

License

MIT License - See LICENSE file

Support

  • Issues: GitHub Issues
  • Documentation: See docs/ directory
  • Architecture: See docs/architecture.md
  • Security: See docs/security.md
  • Testing: See docs/testing_strategy.md

MATIH: Making code modification reliable, auditable, and safe.

Built With

Share this project:

Updates