Midnight at the Voss Manor - Hackathon Submission

🎭 The Inspiration

What if ghosts could argue with each other about your choices? What if AI agents had distinct personalities and debated morality in real-time?

Midnight at the Voss Manor was born from a simple question: Can we create a narrative experience where multiple AI agents feel like real characters with conflicting perspectives?

We wanted to push beyond single-agent chatbots and create a Frankenstein's monster of technologies - stitching together LLMs, text-to-speech, blockchain verification, and procedural storytelling into something hauntingly alive.

🏗️ How We Built It

This project is a chimera of seemingly incompatible technologies:

The Tech Stack

5 AI Ghost Agents powered by Groq's Llama 3.3 70B - each with distinct personalities (maternal Elara, amnesiac scientist Harlan, innocent child Mira, regretful Theo, cold Selene)
Multi-Provider TTS Orchestra - Azure, Google, Play.ht, and Gemini for unique voice acting
Model Context Protocol (MCP) servers for:
- Blockchain vow verification (checking eternal promises)
- Memory persistence across scenes
- Sentiment analysis of player choices
- AI-generated imagery with Gemini
Next.js 14 with React for the interactive frontend
Suno AI for atmospheric background music
Gemini Nano Banano Pro for gothic-cyberpunk scene generation

The Architecture

// Each ghost has a unique personality system
const ghostPersonalities = {
  elara: "Maternal, gentle, focuses on family harmony",
  harlan: "Scientific, amnesiac, logical but emotionally confused",
  mira: "Childlike, innocent, wants play and attention",
  theo: "Dramatic, regretful, seeks redemption",
  selene: "Cold but softening, demands truth"
}

The ghost debate system is the heart of the experience. When you make a choice, all 5 agents independently generate responses, then reach consensus:

// Real-time multi-agent debate
const debate = await fetch('/api/ghost-debate', {
  method: 'POST',
  body: JSON.stringify({
    puzzleContext: currentScene,
    playerMessage: playerChoice
  })
})

💀 Challenges We Faced

1. Groq API Integration & Rate Limiting

Groq's Llama 3.3 70B is incredibly fast, but coordinating 5 simultaneous agent calls was tricky:

Challenge: Rate limits when all ghosts speak at once, API errors breaking the narrative flow
Solution:
- Implemented sequential processing instead of parallel calls
- Added exponential backoff retry logic (3 attempts with 1s, 2s, 4s delays)
- Created fallback personality-specific responses for each ghost
- Used streaming responses to show "thinking" state
Learning: Fast inference doesn't mean unlimited throughput - proper error handling is critical for production

// Sequential ghost debate with retry logic
async function callGroqWithRetry(prompt, retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      return await groq.chat.completions.create({
        model: "llama-3.3-70b-versatile",
        messages: [{ role: "system", content: prompt }]
      })
    } catch (error) {
      if (i === retries - 1) return fallbackResponse
      await new Promise(r => setTimeout(r, Math.pow(2, i) * 1000))
    }
  }
}

for (const ghost of ghosts) {
  const response = await callGroqWithRetry(ghost.personality)
  debate.push({ ghost: ghost.name, message: response })
}

2. MCP Server Development & Integration

Building custom Model Context Protocol servers was uncharted territory:

Challenge:
- No existing examples for blockchain vow verification or game state management
- Understanding JSON-RPC 2.0 message format
- Debugging MCP communication between Kiro and custom servers
- Handling async operations properly
Solution:
- Studied MCP specification thoroughly
- Created 4 custom MCP servers from scratch:
- blockchain-vows-server.js - Eternal promise verification (4 vows stored)
- memory-server.js - Cross-scene state persistence
- sentiment-server.js - Player choice analysis
- image-gen-server.js - Gemini integration for visuals
- Built test harness in Kiro IDE to validate server responses
- Added extensive logging for debugging
Learning: MCP is powerful but requires deep understanding of the protocol spec and async Node.js patterns. The ability to extend the IDE with custom tools during development was game-changing.

// Custom MCP server for vow verification
server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [{
    name: "check_vow",
    description: "Verify if a character kept their eternal vow",
    inputSchema: {
      type: "object",
      properties: {
        person: { type: "string", description: "Who made the vow" },
        vow: { type: "string", description: "What was promised" }
      },
      required: ["person", "vow"]
    }
  }]
}))

// Handle the actual vow checking
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { person, vow } = request.params.arguments
  const vowRecord = vowLedger.find(v => v.person === person && v.vow === vow)
  return {
    content: [{
      type: "text",
      text: vowRecord ? `${person}'s vow to "${vow}" was ${vowRecord.kept ? 'kept' : 'broken'}. ${vowRecord.reason}` : "No record found"
    }]
  }
})

3. Agent-Driven Development with Kiro IDE

This entire project was built using Kiro's AI agent - a meta experience of AI building AI:

Challenge:
- Teaching the agent about game design, narrative structure, and multi-agent systems
- Maintaining context across multiple development sessions
- Getting the right balance of creativity vs. technical accuracy
- Communicating complex requirements clearly
Solution:
- Created detailed steering rules in .kiro/steering/ghost-agent-rules.md and scene-structure.md
- Used spec-driven workflow: requirements → design → tasks → implementation
- Leveraged MCP servers during development for real-time testing
- Iterated on prompts and provided clear acceptance criteria
- Used agent hooks to automate repetitive tasks
Learning: Agent-driven development works best with:
- Clear specifications upfront (we used EARS format for requirements)
- Iterative refinement through conversation
- Custom steering rules for domain-specific knowledge
- Breaking complex features into small, testable tasks
- Treating the agent as a pair programmer, not a magic solution

4. The Audio Desync Problem

In production, audio would play before images loaded, breaking immersion:

Challenge: Localhost worked fine, but Vercel deployment had 1-2 second delays between scene transitions
Solution:
- Implemented image preloading for all 28 scene images on app start
- Added loading screen with progress tracking
- Synchronized audio playback with image load events
- Used unoptimized prop to bypass Next.js image optimization

const [imageLoaded, setImageLoaded] = useState(false)

useEffect(() => {
  if (!imageLoaded) return
  // Only play audio after image loads
  audio.play()
}, [imageLoaded])

5. Ghost Personality Drift

Early versions had ghosts that sounded too similar despite different prompts:

Challenge: Generic LLM responses lacking character voice - all ghosts converged to "helpful AI" tone
Solution:
- Explicit personality constraints in system prompts with speech pattern examples
- Different TTS voices (Azure's Jenny, Guy, Aria, Davis, Sara)
- Character background context injected into every API call
- Temperature tuning per character (Mira: 0.9 for playfulness, Harlan: 0.6 for precision)
- Added "never" constraints: "Mira never uses complex vocabulary. Selene never speaks warmly."

const characterPrompts = {
  mira: `You are Mira, a 7-year-old ghost. You don't understand death. 
         Speak in 1-2 short sentences. Use simple words. Be playful and innocent.
         Example: "I like butterflies! Can we play?"`,
  selene: `You are Selene, cold and betrayed. You speak formally and tersely.
           You're softening but still guarded. Be direct, never warm.
           Example: "Theo returned. But trust... that takes time."`
}

6. Multi-Provider TTS Orchestration

Coordinating 4 different TTS providers with different APIs, rate limits, and voice quality:

Challenge:
- Azure has best quality but strict rate limits
- Google is reliable but less expressive
- Play.ht offers unique voices but slower
- Gemini is fast but lower quality
- Each provider has different API formats and error handling
Solution:
- Created unified speechService interface abstracting provider differences
- Implemented provider cascade: Azure → Google → Play.ht → Gemini → Browser TTS
- Added voice caching to reduce API calls
- Used browser TTS as final fallback for offline play
- Mapped each ghost to specific voices across providers

async function speak(text, character) {
  const providers = [
    { name: 'Azure', voice: 'en-US-JennyNeural' },
    { name: 'Google', voice: 'en-US-Wavenet-F' },
    { name: 'PlayHT', voice: 'jennifer' },
    { name: 'Gemini', voice: 'en-US-Standard-A' }
  ]

  for (const provider of providers) {
    try {
      return await provider.synthesize(text, voiceMap[character])
    } catch (error) {
      console.warn(`${provider.name} failed, trying next...`)
    }
  }

  // Final fallback: browser TTS
  return browserTTS.speak(text)
}

7. Debate Consensus Without Losing Drama

We wanted ghosts to disagree authentically but still reach meaningful conclusions:

Challenge:
- Early attempts had ghosts agreeing too quickly (boring)
- Or arguing endlessly without resolution (frustrating)
- Forced consensus felt artificial and broke immersion
Solution:
- Each ghost generates response independently (no shared context between them)
- Consensus is generated separately as a final step, acknowledging conflicts
- Player sees the full debate unfold, not just the conclusion
- Added "reflection" phase where ghosts can change their minds
- Consensus prompt explicitly asks to honor disagreements: "Acknowledge where the family disagrees, but find common ground"
Learning: Authentic conflict requires isolation during generation, but meaningful resolution requires synthesis

// Generate independent responses
const debate = await Promise.all(
  ghosts.map(ghost => generateResponse(ghost, context))
)

// Then synthesize consensus that honors disagreements
const consensus = await generateConsensus({
  debate,
  context,
  instruction: "Acknowledge disagreements but find common ground"
})
// Result: "While Harlan argues for logic and Mira wants play, 
// the family agrees that love transcends both..."

🧠 What We Learned

1. Multi-Agent Orchestration is Hard

Getting 5 AI personalities to feel distinct while maintaining narrative coherence required careful prompt engineering. Each agent needed:

A unique voice (both literally via TTS and figuratively via personality)
Consistent memory of past interactions
The ability to disagree without derailing the story

2. TTS Provider Fallbacks are Essential

We implemented a cascade system across 4 TTS providers because:

Azure has the best quality but rate limits
Google is reliable but less expressive
Play.ht offers unique voices
Gemini provides a solid fallback

3. MCP is a Game-Changer

The Model Context Protocol let us extend Kiro IDE with custom tools during development:

Testing blockchain vow verification without deploying
Debugging agent memory persistence
Analyzing sentiment in real-time

4. Image Preloading Matters

Production deployment revealed a 1-second delay between scenes. We solved it by:

Preloading all 28 scene images on app start
Adding a loading screen with progress tracking
Using unoptimized images for instant rendering

🎃 Why "Frankenstein" Category?

This project stitches together incompatible technologies into something unexpectedly powerful:

5 LLM agents arguing in real-time
4 TTS providers creating a voice orchestra
Blockchain concepts (vow verification) in a narrative game
MCP protocol extending the development environment
AI-generated art & music creating atmosphere
Next.js + React for interactive storytelling

Like Frankenstein's monster, it's made of disparate parts - but together, they create something alive.

🌟 What's Next?

Branching narratives based on debate outcomes
Player memory system that remembers choices across sessions
More ghost interactions - what if they could possess objects?
Multiplayer debates - multiple players influencing the ghosts

Built with Kiro IDE, powered by Groq, voiced by Azure/Google/Play.ht/Gemini, scored by Suno AI, and visualized by Gemini.

Built With

ai
azure-tts
gemini
google-tts
groq
kiro-ide
llama-3.3
model-context-protocol
multi-agent
next.js
node.js
play.ht
react
suno-ai
typescript
vercel

Updates

Ayush Pathak started this project — Dec 05, 2025 04:43 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.