About the Project

The Spark of Inspiration 💡

The idea for Shipyard came from a frustrating pattern I kept witnessing: brilliant engineers with amazing product ideas getting stuck at the infrastructure planning stage. Some would spend weeks researching cloud services only to feel overwhelmed by the endless options and configurations. Others would jump straight into implementation without proper planning, leading to costly architectural mistakes down the road.

I realized there was a gap between knowing what you want to build and knowing how to build it reliably at scale. Traditional solutions were either too simplistic (basic templates) or too complex (enterprise consulting). What if we could create an AI assistant that could have the kind of conversation a senior infrastructure engineer might have with a colleague—adapting to their expertise level and progressively building a comprehensive plan?

The Learning Journey 🎓

Building Shipyard taught me several crucial lessons about AI system design:

1. Context is Everything

Initially, I planned simple sequential agents, but I quickly discovered that context sharing between agents was the make-or-break feature. Users would mention they're using Railway in the first conversation, only to have the business agent ask about cloud providers later. The solution required building an enhanced context management system where each agent receives the complete conversation history from previous agents, with LLM automated summaries optimized to our use-case and the agent.

2. The Expertise Adaptation Challenge

One of the most interesting technical challenges was creating truly adaptive questioning. The system needed to:

  • Assess both stated expertise ("I'm intermediate") and demonstrated expertise (how they actually describe their project)
  • Dynamically adjust question complexity in real-time
  • Provide gentle explanations without being condescending

This led to a dual-model approach: o3 for nuanced conversation understanding and GPT-4o for fast operations like follow-up detection.

How I Built It 🔧

Architecture Decisions

Model Selection Strategy: I chose a hybrid approach leveraging OpenAI's latest capabilities:

  • Primary Operations (o3 and o3-mini): Main agent conversations, document generation, comprehensive summarization
  • Fast Operations (GPT-4o): Follow-up detection, skip detection, expertise assessment

The reasoning? Complex conversations benefit from o3's enhanced reasoning capabilities, while simple binary decisions (like "does this need a follow-up?") can be handled quickly and cost-effectively by GPT-4o.

State Management Architecture

The trickiest part was designing a state management system that could handle three different types of data:

$$\text{State} = {\text{Chat History}, \text{Application State}, \text{Summaries}}$$

Where:

  • Chat History: Natural conversation flow within each agent
  • Application State: Cross-agent context and incremental document building
  • Summaries: AI-extracted key information after each pillar completes
state = {
    "chat_history": {
        "business": [{"role": "assistant", "content": "How many users?"}]
    },
    "state": {
        "user_profile": {"expertise_level": "intermediate"},
        "current_document": {"architecture": "..."},
        "previous_pillars_completed": ["profiler", "business"]
    },
    "summaries": {
        "profiler": {"expertise": "intermediate", "project_type": "startup"}
    }
}

The Agent Flow Design

I implemented a pillar-based architecture where specialized agents handle different aspects:

graph TD
    A[Profiler Agent] --> B[Business Needs Agent]
    B --> C[App Needs Agent] 
    C --> D[Tribal Knowledge Agent]
    D --> E[Best Practices Agent]
    E --> F[Document Generator]
    F --> G[Review Loop]

Each agent builds upon the previous ones' context, preventing duplicate questions and ensuring consistency.

The Challenges Faced 🚧

1. The Follow-up Detection Problem

Challenge: How do you know when a user's answer needs clarification without over-engineering?

Solution: I built a fast operation using GPT-4o that analyzes user responses for indicators of confusion, uncertainty, or incompleteness:

async def needs_follow_up_fast_operation(user_answer, agent_question):
    prompt = f"""
    Analyze if this user response indicates confusion or need for clarification:

    Agent Question: {agent_question}
    User Response: {user_answer}

    Respond with only "YES" if follow-up needed, "NO" if clear.
    """
    # Uses GPT-4o for sub-second response

2. The Expertise Calibration Problem

Challenge: Users often mis-assess their own expertise level, leading to poorly adapted questions.

Solution: Dual assessment system:

  • Stated Level: What the user tells us ("I'm intermediate")
  • Gauged Complexity: What we observe from their responses ("higher than stated")

This allows the system to gracefully adjust question complexity without contradicting the user.

4. Document Generation Quality

Challenge: Early document outputs were generic and didn't reflect the nuanced requirements gathered during interviews.

Solution: Enhanced the Document Generator with:

  • Complete conversation logs (not just summaries)
  • Clear attribution of user vs. AI recommendations
  • Section-by-section building with cross-references
  • Industry-specific templates based on detected domain

Technical Innovations 🔬

Adaptive Prompting Algorithm

I developed a dynamic prompting system that adjusts not just content but conversation style:

$$\text{Prompt Complexity} = f(\text{Stated Expertise}, \text{Gauged Expertise}, \text{Domain Context})$$

Follow-up Limiting with Context

Instead of hard limits, I implemented contextual follow-up management:

  • Maximum 3 follow-ups per topic
  • Early termination if user shows confidence
  • Question complexity increases with follow-up count

Zero-Repetition Context Sharing

The biggest technical achievement was ensuring agents never repeat questions while maintaining natural conversation flow. This required:

  • Conversation deduplication across agents
  • Technology stack persistence
  • Semantic similarity detection for topics

What's Next 🚀

Building Shipyard taught me that the future of technical planning tools isn't about replacing human expertise—it's about amplifying it. The next iteration will focus on:

  1. LangGraph Migration: For more sophisticated workflow management
  2. Real-time Cost Analysis: Integration with cloud provider APIs for live pricing
  3. IaC Generation: Direct Terraform/CloudFormation output from plans
  4. Collaborative Planning: Multi-stakeholder input and approval workflows
  5. Infra Validation: Using MCP to actually see what is deployed and its performance.

The core insight remains: good infrastructure starts with good conversations. Shipyard proves that AI can facilitate those conversations at scale, adapting to any expertise level while building comprehensive, actionable plans.


What started as a solution to a personal frustration became a deep exploration of conversational AI, context management, and adaptive user interfaces. Every challenge taught me something new about building AI systems that truly understand and adapt to their users.

Built With

Share this project:

Updates

posted an update

View agentic architecture: https://mermaid.live/view#pako:eNqVWM1u3DYQfhVCQW7aZL3rtdeLIoBrx27QOHFj59DGOdDSaJexVlQpybET55YCRYs2SBOgQPqTpCjQQ1sUvfVQ9GH8As0jdEjqh6Ik2_XBiIbDj8P5Ps6M89jxuA_OxJkKGs_I7vpeRPDn8mVyPUrFMdnmLEq17W4C4kYUZ-m9d6-f_6w-ifq-T3q9a2SLsui28GaQpIKmXKDX138oKzHN7-2Lq9d24NMMopTRkMQsDKkgHufCZxFNGY_u6_PKSFan0tcj2zRNQURkYUIMgG0NsC24B0nCoqneZ4ejYkSngIUgFKK-RmHSp6jo3r3-5Q3hw94cfJbNiQCa8AiB87hqIAr2_QzPxcML2G_-Lk0Xh62BKNjVOC4QX_wpvy4CVuSt2KyQdgXbp2EB9uyn3HDx4AwAfWNM67agHtIC-bVPv_9MmUlprwHP2HTWgO2keIAUZ3HMRaqXyJqhD70ryfa1ZnPHYite8Ls3__71zALIl_OD5c9ONp9TwR5Vcnjx1DC2JYdFrLqDWloNEJUA9Wa5ko0D1rm3CREo-ZVH_CDN2VyGVC6ekSi1tJWFKeslKcRkqvdUr0T-bAD4-9Q7KPX3W2nCF4oBxgLw95lEq8W1GY2mQGgch8wzT4HItxi7eXOLrNEwNF_lnQKM3I7zKBOLrNLldpxIqjCYtl3G5eRDXuPRIYhEren7qbpimlX8fHg1p0l9rgNmzNZczoyiYLPMZjs1FS5RtEAQoKI08VhbWApemglMWETD44QlLepSIHhZecCXdWsOrWVl438IxwSOUvWWWsmuCh5Cv3xaMV4tFLHnXBvoOdGsEMf_5Bqf5wbFp95Js1xVDJ--emu7kh7Z3N7tLXLzUjwM-cO78Tp4LNF0vPwqt_aymBR2HbxBe2vqDxgCSW4k0umz31U1QCMprQpHPg3kWoDHpxGz0nz9CANOWQKbNJvqJGM7K60kN2udZVhLaZJg1qV6DJQ1Po9DOGLp8Woep9TBF4adFAsK6Q4kMaYICBzSMDubFbtmDifyQaQoGrIzQ5Hhg7oF6UMuDixycq98kWj5dGw17zKj6QcswXp1rN-gNJDcoqLfBtGjqtp6BkUmNWuC5x1uC-YaCCuAsvZ0ndZ2PSVgKOBLLBUZix6A_Rh2wZvtpPI14GkYCUQeyOD-UStELZmQWzJlPEJUFMIs4iGfMjADXM906YNtAYfaGfFe_VotkGpFQe7yGEmQD_WgKjLns7U4Ias-jVN2COSjDPsmU0XKoqpwMZobqrBtY0uDK8W6WkpTq69ScbWiM46FAHNzmCh5g2-AFUdVypVYn5d2Q9L6TRxHdI73pf6DLEmtZ3EHn9wcbb4SyIagc5Bi0zWyvkjK1TxCkama65NY8JR7PGwpIzeRVi8vi2UNUUatAnpEhgRzQFJJ3sXrHj4x3Yt3ZC9u6RYWfXoCMMh78bxtVwt3ZntBwcW6psvh7VtZzWrdp1o3an5b25NNsxpKyq6neunFWp8Jhl0DKaI3imep2SvMpLQXNVs9YrwojpLZ-YVNDy66-yVdg7c1wnUN0q1utQG51cMeettPa0zCyrcx-2nvhlk51-a3Lg1uoIw7E2EPSV2ZaPerpaLdxc5Fx4HtyWh31r-tpNYSXcxOZ-WuOcqd4109SO1YS36NDvtPypKT-kAjW2cEplDt-2pUa8Q5w7U2wJzhZ88oZ7g2BxHrTsUDlTLTRmtMaPz93OlVk12nVyG6TgdDct1HNQRn15KiVRZFuLqf3Vu7ldrSSJV3sydq_6ZduXd0Pb2nY7EmHdXBrAsWctL_U5OzamtNopx8DMlJG1Sr8y1-giPgUaqmm3unP76VXUcayK7RMmtCVfukxdjZ5SapZFEGJ2febSc9DktZeyFO2OsQVJ2NoBbDySVYCEYBuEkq-AFMLvUHy8v7fv7Ze8j8dDYZxkcWSJD_gaIhgiGMglEJsby_ENDBeRBq0i0QYByMYFwiDGDZH56LEGvllVEEGEe_xIBgyev3z8Pwc9YqENiHKhsYRTAI6iCDAsSAaijfbRZW167LbrNSuu2Di9sYPSoazShsLbo16bh2wXObZa1g1kStlS23Vp7cogy5Rrlxm2XFtfqU2-gvbr2RaHWYUVj1xq2XM7ejALj1GbKQjAlcvrZKDI7rTAXznQmOy-A6cxBzKj-dx3LjnpPOYA57zgT_6UNAcabdc_aiJ7gtptEnnM-LnYJn05kzCWiY4FcWY3SwziiOt5ULxgxijWdR6kwW-wrCmTx2jpzJcDC8sjReWeyP--Ph4mBhaeQ6x85kNL6yMl5YGY76o_HSymhp5YnrPFKH9q8sD8Yr4-XF5cXhYHF5ZTR48h_YrgwA

Log in or sign up for Devpost to join the conversation.