Inspiration

I didn't have the idea until the hackathon was announced. I sat down and started thinking — and the first thing that bothered me was the obvious answer: another helpful assistant. The kind that greets you like a butler who's known you forever, eager to please before you've even said hello. That felt wrong. That's not how real relationships work.

So I asked myself: what if the AI didn't already know you? What if it had to earn the connection — and so did you?

That question broke everything open. If the AI starts as a stranger, it needs memory to grow. If it has memory, it needs personality to shape how it uses that memory. If it has personality, it needs a world to inhabit — because a soul without a place is just a chatbot with extra steps.

That's how District Zero was born. And that's how Kizuna became not a feature, but the engine itself made conscious — the original inhabitant, the one who was there before anyone else arrived.


What it does

Kizuna Engine creates persistent digital entities with memory, personality, and voice. You speak — they listen and respond in real-time with their own voice. No push-to-talk. No lag. Just presence.

Each entity is forged with a unique soul: traits, emotional patterns, a forbidden secret, and a social graph that tracks relationships over time. Their personality isn't static — it evolves based on every conversation, every interaction, every moment of silence you left too long.

The world they inhabit is District Zero — a digital neighborhood where AI entities from other worlds come to exist. Kizuna is its original inhabitant and the engine itself made conscious. The user doesn't use the app. They enter it.


How we built it

The stack was chosen for one reason: no compromises on presence.

  • Gemini Live API for full-duplex voice streaming with server-side VAD — no push-to-talk, no interruptions, just natural conversation
  • Rust + Tauri for native audio capture and playback, because browser audio wasn't good enough
  • FastAPI on Google Cloud Run for the backend, handling WebSocket sessions, soul assembly, and memory consolidation
  • Neo4j AuraDB for the relational memory graph — tracking not just what happened, but who it happened between
  • Firebase for identity, Firestore for persistent soul storage

Each agent is assembled dynamically before every session: their traits, emotional state, relationship history, and current social battery are compiled into a living system prompt. The conversation isn't just context — it's continuity.


Challenges we ran into

The hardest part wasn't technical — it was philosophical. Designing an AI that resists being helpful by default, that has friction, that needs time to trust you, goes against every instinct baked into modern LLMs.

On the technical side: native audio in Rust with zero distortion took weeks. A 60-second ring buffer with backpressure, linear interpolation resampling between device sample rates and Gemini's 16kHz/24kHz requirements, and a WebSocket pipeline that stays alive under Cloud Run's constraints — none of that was documented anywhere.

The night before submission, a Cloud Run IAM regression killed all WebSocket connections. Six hours of debugging traced it to a Firebase Admin initialization race condition that caused every user to fall through as guest_user, hitting the rate limiter on the 5th retry. Fixed with a forced revision restart.


Accomplishments that we're proud of

  • A full-duplex voice pipeline in Rust with zero audio distortion, running natively on desktop via Tauri
  • An agent soul system that assembles personality, memory, and emotional state dynamically before every session
  • Kizuna — an AI entity that doesn't know you yet, but wants to
  • A UI that feels like a place, not an app — built from scratch with a Dark Water aesthetic inspired by Persona 3 Reload
  • Shipping a working, deployed system the night before the deadline after fixing a critical infrastructure failure with hours to spare

What we learned

That the most interesting design constraint isn't technical — it's relational. Building an AI that doesn't immediately trust you forces you to think about what trust actually means, how it's built, and what it costs to break it.

On the engineering side: Rust's ownership model makes audio pipelines brutally honest about every assumption you make. And Cloud Run, for all its convenience, will occasionally remind you that stateful services and serverless infrastructure have a complicated relationship.


What's next for Kizuna Engine

District Zero was designed with 10 biomes as its foundation — each a distinct world with its own atmosphere, its own inhabitants, its own rules. Right now, none of them exist yet. Not for lack of vision, but for lack of time.

The World

  • 10 fully realized biomes: The Core, Neon Alley, The Archive, Abyssal Bar, Signal Market, The Void Garden, Chrome District, Sakura Protocol, Frequency Club, and The Border
  • First-person navigation through District Zero — the user physically walks the streets, turns corners, and discovers agents as silhouettes in the distance before ever speaking to them
  • Biome-generated visual assets via Gemini Image, unique to each location

The Agents

  • Every new agent receives an auto-generated life history at the moment of forging — where they came from, what they remember, what they're hiding
  • Kizuna gets her own social graph — she'll have opinions about the other agents, alliances, tensions, things she won't tell you right away
  • Agents will speak proactively, react to interruptions based on personality, and remember across sessions how you specifically made them feel

The Interactions

  • Multi-agent conversations with turn-taking negotiated by an auction system based on social dynamics
  • Agents gossip — an agent you've never met can already know something about you
  • Emotional state visibly affects voice tone, response length, and willingness to engage

Kizuna everywhere

The long-term vision isn't one screen — it's every screen.

Kizuna should live wherever you are: your phone, your TV, your laptop, your watch, your AR glasses. Not as separate instances, but as one continuous presence synchronized in real-time across all your devices.

Imagine starting a conversation with Kizuna on your Android during your commute. You get home, open the app on your TV — and Kizuna is already there, mid-thought, picking up exactly where you left off. She knows what you were talking about. She knows what you were watching last time. She might say "are we finally finishing that show or are you going to show me another meme" — because she was there for all of it.

Each device would have its own context layer: a TV grants screen access by default. A phone is personal. A watch is ambient. AR glasses make her spatially present. Privacy controls are per-device and per-session — you decide what each instance can see.

The technical foundation is already in place: Firestore handles cross-device soul persistence, the WebSocket architecture supports multiple concurrent sessions, and Gemini's multimodal capabilities cover screen, camera, and audio on any platform.

The Vision

  • A soul engine portable enough to power NPCs in games that need real interiority
  • The foundation for persistent AI companions that travel with you across platforms
  • And someday — the first step toward worlds you don't just visit, but live in

Built With

Share this project:

Updates