Cosmic Identity: A Journey Through the Multiverse
The Inspiration
The spark for Cosmic Identity came from a simple "what if": What if those viral personality tests actually had the power of modern multi-modal AI?
In a world filled with generic AI avatars, I felt something was missing: Soul. A static image of yourself as a cyborg is cool, but it doesn't tell a story. I wanted to build something that felt like a complete identity synthesis—giving you a name, a history, a set of skills, and most importantly, a voice. The goal was to move beyond "filters" and create a truly immersive experience where you don't just see a picture; you meet your alternate self.
What it does
Cosmic Identity is a trans-dimensional voyage. Users start by uploading a single selfie and selecting a destination universe—ranging from the neon streets of a Cyberpunk future to the soft, hand-painted hills of a Ghibli-inspired world.
Within seconds, the app performs a three-stage metamorphosis:
- Visual Transformation: Your facial features are woven into the aesthetic of the selected universe.
- Narrative Birth: A unique persona is generated, complete with a backstory and RPG-style stats.
- Vocal Manifestation: Your character speaks to you, welcoming you to their dimension in a voice that fits the theme.
The final result is a beautiful interactive card that can be exported as a high-quality PNG or a synthesized MP4 video with audio.
How we built it
The project is powered by a sophisticated multi-modal pipeline using the Google Gemini API:
- The Visual Metamorphosis: We used
gemini-2.5-flash-imagefor its incredible speed and ability to maintain structural facial similarity while applying extreme stylistic modifiers. - The Narrative Core:
gemini-3-flash-previewanalyzes the original image for gender and features, then generates a structured JSON payload containing the character's name, backstory, and stats. This ensures the text is contextually grounded in the user's appearance. - The Vocal Manifestation: To bring the character to life, we used
gemini-2.5-flash-preview-tts. We dynamically mapped different Gemini voices (like Zephyr, Kore, and Puck) to specific character personas based on the gender detected in the previous step.
The frontend is built with React and Tailwind CSS, using a "Space-Noir" glassmorphic aesthetic to create a cinematic atmosphere.
Built
- AI Engine: Google Gemini API
gemini-2.5-flash-image(Image Generation)gemini-3-flash-preview(Reasoning & Structured Data)gemini-2.5-flash-preview-tts(Speech Synthesis)
- Frameworks: React 19, Vite, TypeScript
- Styling: Tailwind CSS, FontAwesome 6
- Media APIs: Web Audio API (PCM Decoding), HTML5 Canvas, MediaRecorder API
- Infrastructure: Vercel (Deployment)
Challenges we ran into
Building this wasn't without its "glitches in the matrix":
- The Audio "Silent" Wall: Mobile browsers (especially iOS Safari) block audio unless it's triggered by a direct user interaction. We had to implement a robust "Wake Up" logic for the
AudioContextto ensure the AI's voice could be heard. - Raw PCM Decoding: Gemini's TTS returns raw 16-bit PCM bytes without a header. Unlike standard audio files, this data requires manual mapping. We built a custom decoding utility to transform these bytes into a
Float32array for the Web Audio API at exactly 24kHz. - Video Synthesis: Creating the "Save Video" feature was a significant technical hurdle. We had to bridge the Canvas API (for the image) and the Web Audio API (for the voice) into a single
MediaStream, then feed it into aMediaRecorderto produce a shareable MP4 entirely on the client side.
Accomplishments that we're proud of
- Seamless Multi-modal Chaining: Achieving a pipeline where Image, Text, and Audio are generated in sequence without the user feeling the complexity of the "handshakes" between different AI models.
- In-Browser Video Rendering: Most apps rely on expensive server-side video rendering. We achieved this 100% on the client, saving costs and protecting user privacy.
- The Aesthetic: Creating a UI that feels premium, cinematic, and responsive across all devices.
What we learned
This project was a masterclass in Modern Media Engineering. We learned how to handle raw binary data streams, how to synchronize asynchronous AI calls for a smooth UX, and how to push the limits of what a browser can do with generative media. Most importantly, we learned that AI is most powerful when it’s used to bridge the gap between human imagination and digital reality.
What's next for Cosmic Identity
The multiverse is expanding. Future updates will include:
- Real-time Camera Support: Let users "step into the portal" using their live webcam.
- Collaborative Dimensions: Share your identity to a global gallery where others can see the infinite versions of humanity.
- Extended Sagas: Allow users to "chat" with their alternate selves using Gemini's streaming text capabilities.
Welcome to the Multiverse.

Log in or sign up for Devpost to join the conversation.