Landing Page
Select picture and your destination
You avatar in multiverse is ready

Cosmic Identity: A Journey Through the Multiverse

The Inspiration

The spark for Cosmic Identity came from a simple "what if": What if those viral personality tests actually had the power of modern multi-modal AI?

In a world filled with generic AI avatars, I felt something was missing: Soul. A static image of yourself as a cyborg is cool, but it doesn't tell a story. I wanted to build something that felt like a complete identity synthesis—giving you a name, a history, a set of skills, and most importantly, a voice. The goal was to move beyond "filters" and create a truly immersive experience where you don't just see a picture; you meet your alternate self.

What it does

Cosmic Identity is a trans-dimensional voyage. Users start by uploading a single selfie and selecting a destination universe—ranging from the neon streets of a Cyberpunk future to the soft, hand-painted hills of a Ghibli-inspired world.

Within seconds, the app performs a three-stage metamorphosis:

Visual Transformation: Your facial features are woven into the aesthetic of the selected universe.
Narrative Birth: A unique persona is generated, complete with a backstory and RPG-style stats.
Vocal Manifestation: Your character speaks to you, welcoming you to their dimension in a voice that fits the theme.

The final result is a beautiful interactive card that can be exported as a high-quality PNG or a synthesized MP4 video with audio.

How we built it

The project is powered by a sophisticated multi-modal pipeline using the Google Gemini API:

The Visual Metamorphosis: We used gemini-2.5-flash-image for its incredible speed and ability to maintain structural facial similarity while applying extreme stylistic modifiers.
The Narrative Core: gemini-3-flash-preview analyzes the original image for gender and features, then generates a structured JSON payload containing the character's name, backstory, and stats. This ensures the text is contextually grounded in the user's appearance.
The Vocal Manifestation: To bring the character to life, we used gemini-2.5-flash-preview-tts. We dynamically mapped different Gemini voices (like Zephyr, Kore, and Puck) to specific character personas based on the gender detected in the previous step.

The frontend is built with React and Tailwind CSS, using a "Space-Noir" glassmorphic aesthetic to create a cinematic atmosphere.

Built

AI Engine: Google Gemini API
- gemini-2.5-flash-image (Image Generation)
- gemini-3-flash-preview (Reasoning & Structured Data)
- gemini-2.5-flash-preview-tts (Speech Synthesis)
Frameworks: React 19, Vite, TypeScript
Styling: Tailwind CSS, FontAwesome 6
Media APIs: Web Audio API (PCM Decoding), HTML5 Canvas, MediaRecorder API
Infrastructure: Vercel (Deployment)

Challenges we ran into

Building this wasn't without its "glitches in the matrix":

The Audio "Silent" Wall: Mobile browsers (especially iOS Safari) block audio unless it's triggered by a direct user interaction. We had to implement a robust "Wake Up" logic for the AudioContext to ensure the AI's voice could be heard.
Raw PCM Decoding: Gemini's TTS returns raw 16-bit PCM bytes without a header. Unlike standard audio files, this data requires manual mapping. We built a custom decoding utility to transform these bytes into a Float32 array for the Web Audio API at exactly 24kHz.
Video Synthesis: Creating the "Save Video" feature was a significant technical hurdle. We had to bridge the Canvas API (for the image) and the Web Audio API (for the voice) into a single MediaStream, then feed it into a MediaRecorder to produce a shareable MP4 entirely on the client side.

Accomplishments that we're proud of

Seamless Multi-modal Chaining: Achieving a pipeline where Image, Text, and Audio are generated in sequence without the user feeling the complexity of the "handshakes" between different AI models.
In-Browser Video Rendering: Most apps rely on expensive server-side video rendering. We achieved this 100% on the client, saving costs and protecting user privacy.
The Aesthetic: Creating a UI that feels premium, cinematic, and responsive across all devices.

What we learned

This project was a masterclass in Modern Media Engineering. We learned how to handle raw binary data streams, how to synchronize asynchronous AI calls for a smooth UX, and how to push the limits of what a browser can do with generative media. Most importantly, we learned that AI is most powerful when it’s used to bridge the gap between human imagination and digital reality.

What's next for Cosmic Identity

The multiverse is expanding. Future updates will include:

Real-time Camera Support: Let users "step into the portal" using their live webcam.
Collaborative Dimensions: Share your identity to a global gallery where others can see the infinite versions of humanity.
Extended Sagas: Allow users to "chat" with their alternate selves using Gemini's streaming text capabilities.

Welcome to the Multiverse.

Built With

api
canvas
css
fontawesome
gemini
gemini-2.5-flash-image
gemini-3-flash-preview
google
html5
mediarecorder
react
tailwind
typescript
vercel
vite
web

Updates

Ashish Aggrawal started this project — Jan 03, 2026 09:59 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.