The Problem: Lost Reasoning in the AI Building Era

I've been vibecoding with LLMs since January 2025. By July, I had my first working builds. By September, I was deep in AWS infrastructure.

But I kept hitting the same wall: brilliant conversations with Claude disappeared into chat history. I'd solve a bug at 2am, close the tab, and forget the reasoning by morning. I'd make a design decision, implement it, then weeks later wonder "why did I do it this way?"

Every builder using AI assistants faces this: we're generating reasoning faster than we can preserve it.

The Inspiration

After winning 2nd place at CCC AI Camp (Cal Poly SLO/AWS, August 2025), I knew I could build something meaningful. The AWS Agent Hackathon gave me the perfect excuse to tackle reasoning preservation.

The insight: What if an AI agent could analyze my AI-assisted work sessions and extract the signal from the noise?

What I Built

Ariadne Clew is an autonomous reasoning extraction agent built on AWS AgentCore Runtime. It:

  • Analyzes chat transcripts from any LLM conversation (Claude, ChatGPT, etc.)
  • Extracts structured insights: aha moments, MVP changes, design tradeoffs, code snippets, scope creep
  • Generates human-readable summaries alongside machine-readable JSON
  • Validates code snippets using AgentCore Code Interpreter
  • Assigns quality scores to flag empty sessions vs. substantive work

Tech Stack:

  • AWS Bedrock AgentCore Runtime - Core agent orchestration primitive
  • Amazon Bedrock (Claude Sonnet 4) - Content analysis and classification
  • Strands Agents SDK - Agent implementation framework
  • AWS Lambda - Container execution environment
  • AWS CodeBuild - Containerization pipeline
  • Amazon ECR - Container image registry
  • Python + Flask - Bridge server connecting frontend to AgentCore
  • Pydantic - Schema validation and type safety
  • HTML/CSS/JS - Polished frontend with drag-drop file upload

How I Built It

Phase 1: Foundation (September)

  • First Bedrock connection established
  • Built classification logic for identifying technical content
  • Created schema for structured recap output

Phase 2: AgentCore Integration (September)

  • Wrapped logic in Strands Agent framework
  • Connected to BedrockAgentCoreApp runtime
  • Implemented proper error handling and fallbacks

Phase 3: Production Hardening (Late September/October)

  • Solved persistent AgentCore container caching issue
  • Built bridge server to connect frontend to deployed agent
  • Comprehensive test suite
  • Added quality flags and validation logic

Architecture:

User uploads chat transcript
    ↓
Frontend (http://localhost:5000)
    ↓
Bridge Server (Flask API)
    ↓
AgentCore Runtime (AWS Lambda + ECR)
    ↓
Strands Agent (agent.py)
    ↓
Bedrock Claude Sonnet 4
    ↓
Structured JSON + Readable Summary

Deployment Pipeline:
agent.py → CodeBuild → ECR → Lambda

Challenges I Faced

1. AgentCore Container Caching Hell

The biggest technical challenge was AgentCore's persistent caching. Updated code wouldn't deploy despite:

  • Multiple agentcore launch attempts
  • Complete destroy/rebuild cycles
  • Deleting config files
  • Using --force-rebuild flags

Solution: Nuclear reset approach

rm .bedrock_agentcore.yaml
agentcore configure --entrypoint backend/agent.py
agentcore launch --auto-update-on-conflict

The --auto-update-on-conflict flag was the key to forcing cache invalidation.

2. Strands Response Format Wrestling

AgentCore wraps Bedrock responses in a complex format:

{
  "content": [{"text": "```json\n{actual_data}\n```"}],
  "role": "assistant"
}

I had to build extraction logic to unwrap this, parse the JSON string, and validate the structure before generating human-readable summaries.

3. First AWS Production Deployment

This was my first time deploying to AWS Agentcore infrastructure. Learning curve included:

  • Understanding Lambda cold starts
  • Configuring ECR image pushes
  • Managing IAM permissions for Bedrock access
  • Debugging CloudWatch logs for silent failures

4. Balancing Autonomous Processing with Quality

Early versions would "analyze" any text and generate meaningless recaps. I implemented quality flags to detect:

  • Empty or minimal transcripts
  • No technical content
  • Sessions too short to extract insights

This ensures the agent provides honest feedback when there's nothing substantive to extract.

What I Learned

Technical Skills:

  • AWS AgentCore Runtime architecture and deployment
  • Bedrock model invocation and prompt engineering
  • Strands Agents SDK for building production agents
  • Lambda + ECR containerization
  • Real-time debugging of distributed systems

Product Judgment:

  • When to ship MVP vs. perfect solution
  • How to scope a hackathon project (froze scope in MVP_ROADMAP.md)
  • Importance of quality signals over naive extraction
  • Value of comprehensive error handling and fallbacks

Meta-Learning:

  • LLMs are brilliant assistants but require human architecture
  • Persistent debugging pays off (the cache issue took 2+ days)
  • Good documentation helps future-you and judges
  • Test suites catch regressions during rapid iteration

Why This Matters

For indie builders: We're entering an era where reasoning happens in conversations with AI. Those insights disappear unless we preserve them systematically.

For teams: Shared context is the foundation of collaboration. Ariadne Clew turns ephemeral chat into durable memory artifacts.

For the AI ecosystem: As agents become more sophisticated, reasoning transparency matters. This project demonstrates how agents can make their own thinking auditable and preservable.

Future Vision

Post-MVP enhancements:

  • Multi-session reasoning chains (track decisions across weeks)
  • Agentcore Memory with chunking for longer chat extracts
  • Team collaboration features (shared context pools)
  • Pattern detection (identify recurring design tradeoffs)
  • Integration with project management tools (auto-populate docs)

The big idea: Every builder deserves a ghost cofounder who remembers everything and explains it clearly.


🏗️ Built With

Languages:

  • Python 3.12
  • JavaScript (ES6+)
  • HTML5
  • CSS3

Frameworks & Libraries:

  • Flask (Python web framework)
  • Pydantic (Schema validation)
  • Strands Agents SDK (Agent implementation)
  • boto3 (AWS SDK for Python)

AWS Services:

  • Amazon Bedrock (Claude Sonnet 4)
  • AWS AgentCore Runtime (BedrockAgentCoreApp)
  • AWS Lambda (Agent execution)
  • AWS CodeBuild (Containerization)
  • Amazon ECR (Container registry)
  • Amazon CloudWatch (Logging)
  • AWS X-Ray (Tracing)

Development Tools:

  • pytest (Test framework)
  • Black (Code formatting)
  • mypy (Type checking)
  • GitHub Actions (CI/CD)

🔧 Amazon Tools Used

Amazon Bedrock AgentCore - Runtime primitive for agent orchestration ✅ Amazon Bedrock - Claude Sonnet 4 for content analysis ✅ Strands Agents SDK - Agent implementation framework


🔗 Project Links

Code Repository: https://github.com/earlgreyhot1701D/Ariadne-Clew

Live Demo: Local deployment (instructions below)


🧪 Testing Instructions

Prerequisites

  • Python 3.12+
  • AWS CLI configured with Bedrock access
  • AgentCore CLI installed

Local Setup

  1. Clone the repository

    git clone https://github.com/earlgreyhot1701D/Ariadne-Clew.git
    cd Ariadne-Clew
    
  2. Install dependencies

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
    
  3. Configure AgentCore

    agentcore configure --entrypoint backend/agent.py
    agentcore launch --auto-update-on-conflict
    
  4. Start the bridge server

    python backend/bridge_server.py
    
  5. Open the frontend

    http://localhost:5000
    

Testing the Agent

Option 1: Via Frontend

  1. Open http://localhost:5000 in your browser
  2. Copy and paste text
  3. Click "Generate Recap"
  4. View structured insights in the Session Insights panel
  5. View raw JSON in the Structured Data panel

Option 2: Direct AgentCore Invocation

agentcore invoke '{"prompt":"User: We need authentication\nAssistant: Use JWT tokens\nUser: Why JWT?\nAssistant: Stateless and scalable"}' --session-id "test-session-12345678-1234-1234-1234-123456789012"

Expected Output:

{
  "status": "success",
  "result": {
    "human_readable": "## Session Recap: Authentication Discussion\n\n### 💡 Key Insights\n- Decided on JWT tokens for authentication...",
    "structured_data": {
      "aha_moments": ["JWT tokens chosen for stateless authentication"],
      "design_tradeoffs": ["JWT: stateless scaling vs. session: simpler revocation"],
      "code_snippets": [],
      "mvp_changes": ["Added JWT authentication"],
      "quality_flags": []
    }
  }
}

Sample Test Files

Use Demo_Test.txt included in the repository for a realistic test case. This file contains:

  • Technical discussion about auth strategies
  • Code snippets (Python JWT implementation)
  • Design tradeoffs (JWT vs sessions)
  • MVP scope decisions

Running the Test Suite

pytest tests/ -v

📊 Architecture Diagram

See: ARCHITECTURE_DIAGRAM.png (uploaded to image gallery)

Components:

  1. Frontend Layer: HTML/CSS/JS with drag-drop file upload
  2. Bridge Server: Flask API handling CORS and request routing
  3. AgentCore Runtime: AWS Lambda + ECR container orchestration
  4. Strands Agent: agent.py implementing reasoning extraction logic
  5. Bedrock Claude: LLM performing content analysis and classification
  6. Data Flow: User input → API → AgentCore → Bedrock → Structured output

Key Interactions:

  • Frontend sends POST request with chat_log to bridge server
  • Bridge server invokes AgentCore with properly formatted payload
  • AgentCore executes Strands agent.py
  • Agent sends classification prompt to Bedrock Claude
  • Claude analyzes transcript and returns structured JSON
  • Agent extracts and validates response
  • Bridge server returns human_readable + structured_data to frontend

🏆 Why This Project Stands Out

Technical Achievement:

  • Real AWS AgentCore Runtime integration (not simulated)
  • Production-grade error handling and fallbacks
  • Comprehensive test coverage with quality signals
  • Solved complex container caching issues

Product Value:

  • Addresses genuine pain point for AI-assisted builders
  • Autonomous processing (no human-in-the-loop required)
  • Structured output enables downstream automation
  • Quality flags ensure honest feedback

Developer Experience:

  • Clear documentation and testing instructions
  • Frozen MVP scope demonstrates shipping discipline
  • Comprehensive error messages for debugging
  • Polished frontend shows attention to UX

Built by: La Shara Cordero
Contact: lsjcordero@gmail.com
Timeline: July 2025 (first Bedrock connection) → October 2025 (production deployment)
Previous Win: 2nd Place, CCC AI Camp (Cal Poly SLO/AWS, August 2025)


"Don't commit without context. Don't build without reason. Don't lose the thread." 🧶

Built With

Share this project:

Updates