Midnight at the Voss Manor - Hackathon Submission

🎭 The Inspiration

What if ghosts could argue with each other about your choices? What if AI agents had distinct personalities and debated morality in real-time?

Midnight at the Voss Manor was born from a simple question: Can we create a narrative experience where multiple AI agents feel like real characters with conflicting perspectives?

We wanted to push beyond single-agent chatbots and create a Frankenstein's monster of technologies - stitching together LLMs, text-to-speech, blockchain verification, and procedural storytelling into something hauntingly alive.

🏗️ How We Built It

This project is a chimera of seemingly incompatible technologies:

The Tech Stack

  • 5 AI Ghost Agents powered by Groq's Llama 3.3 70B - each with distinct personalities (maternal Elara, amnesiac scientist Harlan, innocent child Mira, regretful Theo, cold Selene)
  • Multi-Provider TTS Orchestra - Azure, Google, Play.ht, and Gemini for unique voice acting
  • Model Context Protocol (MCP) servers for:
    • Blockchain vow verification (checking eternal promises)
    • Memory persistence across scenes
    • Sentiment analysis of player choices
    • AI-generated imagery with Gemini
  • Next.js 14 with React for the interactive frontend
  • Suno AI for atmospheric background music
  • Gemini Nano Banano Pro for gothic-cyberpunk scene generation

The Architecture

// Each ghost has a unique personality system
const ghostPersonalities = {
  elara: "Maternal, gentle, focuses on family harmony",
  harlan: "Scientific, amnesiac, logical but emotionally confused",
  mira: "Childlike, innocent, wants play and attention",
  theo: "Dramatic, regretful, seeks redemption",
  selene: "Cold but softening, demands truth"
}

The ghost debate system is the heart of the experience. When you make a choice, all 5 agents independently generate responses, then reach consensus:

// Real-time multi-agent debate
const debate = await fetch('/api/ghost-debate', {
  method: 'POST',
  body: JSON.stringify({
    puzzleContext: currentScene,
    playerMessage: playerChoice
  })
})

💀 Challenges We Faced

1. Groq API Integration & Rate Limiting

Groq's Llama 3.3 70B is incredibly fast, but coordinating 5 simultaneous agent calls was tricky:

  • Challenge: Rate limits when all ghosts speak at once, API errors breaking the narrative flow
  • Solution:
    • Implemented sequential processing instead of parallel calls
    • Added exponential backoff retry logic (3 attempts with 1s, 2s, 4s delays)
    • Created fallback personality-specific responses for each ghost
    • Used streaming responses to show "thinking" state
  • Learning: Fast inference doesn't mean unlimited throughput - proper error handling is critical for production
// Sequential ghost debate with retry logic
async function callGroqWithRetry(prompt, retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      return await groq.chat.completions.create({
        model: "llama-3.3-70b-versatile",
        messages: [{ role: "system", content: prompt }]
      })
    } catch (error) {
      if (i === retries - 1) return fallbackResponse
      await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000))
    }
  }
}

for (const ghost of ghosts) {
  const response = await callGroqWithRetry(ghost.personality)
  debate.push({ ghost: ghost.name, message: response })
}

2. MCP Server Development & Integration

Building custom Model Context Protocol servers was uncharted territory:

  • Challenge:
    • No existing examples for blockchain vow verification or game state management
    • Understanding JSON-RPC 2.0 message format
    • Debugging MCP communication between Kiro and custom servers
    • Handling async operations properly
  • Solution:
    • Studied MCP specification thoroughly
    • Created 4 custom MCP servers from scratch:
    • blockchain-vows-server.js - Eternal promise verification (4 vows stored)
    • memory-server.js - Cross-scene state persistence
    • sentiment-server.js - Player choice analysis
    • image-gen-server.js - Gemini integration for visuals
    • Built test harness in Kiro IDE to validate server responses
    • Added extensive logging for debugging
  • Learning: MCP is powerful but requires deep understanding of the protocol spec and async Node.js patterns. The ability to extend the IDE with custom tools during development was game-changing.
// Custom MCP server for vow verification
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [{
    name: "check_vow",
    description: "Verify if a character kept their eternal vow",
    inputSchema: {
      type: "object",
      properties: {
        person: { type: "string", description: "Who made the vow" },
        vow: { type: "string", description: "What was promised" }
      },
      required: ["person", "vow"]
    }
  }]
}))

// Handle the actual vow checking
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { person, vow } = request.params.arguments
  const vowRecord = vowLedger.find(v => v.person === person && v.vow === vow)
  return {
    content: [{
      type: "text",
      text: vowRecord ? `${person}'s vow to "${vow}" was ${vowRecord.kept ? 'kept' : 'broken'}. ${vowRecord.reason}` : "No record found"
    }]
  }
})

3. Agent-Driven Development with Kiro IDE

This entire project was built using Kiro's AI agent - a meta experience of AI building AI:

  • Challenge:
    • Teaching the agent about game design, narrative structure, and multi-agent systems
    • Maintaining context across multiple development sessions
    • Getting the right balance of creativity vs. technical accuracy
    • Communicating complex requirements clearly
  • Solution:
    • Created detailed steering rules in .kiro/steering/ghost-agent-rules.md and scene-structure.md
    • Used spec-driven workflow: requirements → design → tasks → implementation
    • Leveraged MCP servers during development for real-time testing
    • Iterated on prompts and provided clear acceptance criteria
    • Used agent hooks to automate repetitive tasks
  • Learning: Agent-driven development works best with:
    • Clear specifications upfront (we used EARS format for requirements)
    • Iterative refinement through conversation
    • Custom steering rules for domain-specific knowledge
    • Breaking complex features into small, testable tasks
    • Treating the agent as a pair programmer, not a magic solution

4. The Audio Desync Problem

In production, audio would play before images loaded, breaking immersion:

  • Challenge: Localhost worked fine, but Vercel deployment had 1-2 second delays between scene transitions
  • Solution:
    • Implemented image preloading for all 28 scene images on app start
    • Added loading screen with progress tracking
    • Synchronized audio playback with image load events
    • Used unoptimized prop to bypass Next.js image optimization
const [imageLoaded, setImageLoaded] = useState(false)

useEffect(() => {
  if (!imageLoaded) return
  // Only play audio after image loads
  audio.play()
}, [imageLoaded])

5. Ghost Personality Drift

Early versions had ghosts that sounded too similar despite different prompts:

  • Challenge: Generic LLM responses lacking character voice - all ghosts converged to "helpful AI" tone
  • Solution:
    • Explicit personality constraints in system prompts with speech pattern examples
    • Different TTS voices (Azure's Jenny, Guy, Aria, Davis, Sara)
    • Character background context injected into every API call
    • Temperature tuning per character (Mira: 0.9 for playfulness, Harlan: 0.6 for precision)
    • Added "never" constraints: "Mira never uses complex vocabulary. Selene never speaks warmly."
const characterPrompts = {
  mira: `You are Mira, a 7-year-old ghost. You don't understand death. 
         Speak in 1-2 short sentences. Use simple words. Be playful and innocent.
         Example: "I like butterflies! Can we play?"`,
  selene: `You are Selene, cold and betrayed. You speak formally and tersely.
           You're softening but still guarded. Be direct, never warm.
           Example: "Theo returned. But trust... that takes time."`
}

6. Multi-Provider TTS Orchestration

Coordinating 4 different TTS providers with different APIs, rate limits, and voice quality:

  • Challenge:
    • Azure has best quality but strict rate limits
    • Google is reliable but less expressive
    • Play.ht offers unique voices but slower
    • Gemini is fast but lower quality
    • Each provider has different API formats and error handling
  • Solution:
    • Created unified speechService interface abstracting provider differences
    • Implemented provider cascade: Azure → Google → Play.ht → Gemini → Browser TTS
    • Added voice caching to reduce API calls
    • Used browser TTS as final fallback for offline play
    • Mapped each ghost to specific voices across providers
async function speak(text, character) {
  const providers = [
    { name: 'Azure', voice: 'en-US-JennyNeural' },
    { name: 'Google', voice: 'en-US-Wavenet-F' },
    { name: 'PlayHT', voice: 'jennifer' },
    { name: 'Gemini', voice: 'en-US-Standard-A' }
  ]

  for (const provider of providers) {
    try {
      return await provider.synthesize(text, voiceMap[character])
    } catch (error) {
      console.warn(`${provider.name} failed, trying next...`)
    }
  }

  // Final fallback: browser TTS
  return browserTTS.speak(text)
}

7. Debate Consensus Without Losing Drama

We wanted ghosts to disagree authentically but still reach meaningful conclusions:

  • Challenge:
    • Early attempts had ghosts agreeing too quickly (boring)
    • Or arguing endlessly without resolution (frustrating)
    • Forced consensus felt artificial and broke immersion
  • Solution:
    • Each ghost generates response independently (no shared context between them)
    • Consensus is generated separately as a final step, acknowledging conflicts
    • Player sees the full debate unfold, not just the conclusion
    • Added "reflection" phase where ghosts can change their minds
    • Consensus prompt explicitly asks to honor disagreements: "Acknowledge where the family disagrees, but find common ground"
  • Learning: Authentic conflict requires isolation during generation, but meaningful resolution requires synthesis
// Generate independent responses
const debate = await Promise.all(
  ghosts.map(ghost => generateResponse(ghost, context))
)

// Then synthesize consensus that honors disagreements
const consensus = await generateConsensus({
  debate,
  context,
  instruction: "Acknowledge disagreements but find common ground"
})
// Result: "While Harlan argues for logic and Mira wants play, 
// the family agrees that love transcends both..."

🧠 What We Learned

1. Multi-Agent Orchestration is Hard

Getting 5 AI personalities to feel distinct while maintaining narrative coherence required careful prompt engineering. Each agent needed:

  • A unique voice (both literally via TTS and figuratively via personality)
  • Consistent memory of past interactions
  • The ability to disagree without derailing the story

2. TTS Provider Fallbacks are Essential

We implemented a cascade system across 4 TTS providers because:

  • Azure has the best quality but rate limits
  • Google is reliable but less expressive
  • Play.ht offers unique voices
  • Gemini provides a solid fallback

3. MCP is a Game-Changer

The Model Context Protocol let us extend Kiro IDE with custom tools during development:

  • Testing blockchain vow verification without deploying
  • Debugging agent memory persistence
  • Analyzing sentiment in real-time

4. Image Preloading Matters

Production deployment revealed a 1-second delay between scenes. We solved it by:

  • Preloading all 28 scene images on app start
  • Adding a loading screen with progress tracking
  • Using unoptimized images for instant rendering

🎃 Why "Frankenstein" Category?

This project stitches together incompatible technologies into something unexpectedly powerful:

  • 5 LLM agents arguing in real-time
  • 4 TTS providers creating a voice orchestra
  • Blockchain concepts (vow verification) in a narrative game
  • MCP protocol extending the development environment
  • AI-generated art & music creating atmosphere
  • Next.js + React for interactive storytelling

Like Frankenstein's monster, it's made of disparate parts - but together, they create something alive.

🌟 What's Next?

  • Branching narratives based on debate outcomes
  • Player memory system that remembers choices across sessions
  • More ghost interactions - what if they could possess objects?
  • Multiplayer debates - multiple players influencing the ghosts

Built with Kiro IDE, powered by Groq, voiced by Azure/Google/Play.ht/Gemini, scored by Suno AI, and visualized by Gemini.

Built With

Share this project:

Updates