Inspiration

Why: Because we humans, each and everyone, have our own biases—known and unknown—and these, over time, turn into hard opinions, we are uncomfortable being questioned or challenged. In the age of easy information generation & distribution, it's easy to find content aligning with our beliefs in our feed, which further reinforces our biases. I observed this in myself and the people around me in various contexts: how we become emotionally attached to our thoughts and beliefs, to the point that we no longer want to entertain other perspectives. This limits us in our thoughts and in the way we embrace & accept the world around us. In a time when people feel AI is alienating human touch, how about we use it to rekindle the kindness in how we see each other?

What it does

This "Other Side" acts like an on-demand therapist, asking you to pause, reflect, rephrase your thoughts, and reconsider the emotions they bring. The idea is that greater awareness of the different sides can help keep us from developing extreme views on anything.

How we built it

It started with one simple idea, "what if you could just show someone the other side of what they're looking at?". That's it. That was the whole plan.

But as we started talking it through, the layers kept showing up. A different perspective needs a voice — and that voice can't sound like a robot reading a Wikipedia page. It needs to know when to be warm, when to be sharp, and when to be a little sarcastic. It needs guardrails, because not everything deserves an "other side." It needs to work for everyone, not just people who are comfortable with technology.

One question led to another. Before we knew it, the simple idea had a therapy-buddy mode, a bias-checker, a conflict-resolver, five lenses, an adaptive voice, fully generated video output, and a Chrome extension that lets you point at any video on any webpage and flip it.

Once we got our credits last Thursday/Friday, the layers showed up fast. We realized a different perspective needs a specific voice—one that knows when to be warm and when to be sharp. We built a FastAPI backend integrated with the Google ADK to power the "Root Agent" logic. We then layered in Vertex AI Imagen 3 and Veo 2 for high-fidelity visuals, and Cloud TTS Studio Voices to give those lenses a human soul.

Tech Details: Safety & Responsibility

Because we are dealing with sensitive human perspectives, we didn't just build for speed; we built for safety.

Multi-Layer Guardrails: We implemented a "Layer 1" guardrail that sanitizes and validates user input before it ever touches the AI. If a situation is deemed unsafe or harmful, the system gracefully declines rather than generating a potentially toxic "other side".

Media Safety & Copyright: We took a "Privacy and IP First" approach. For multimodal analysis, files are processed entirely in memory as base64 data and discarded immediately—nothing is written to disk. To handle copyright and safety in generated media, we utilize Vertex AI’s built-in safety filters and digital watermarking for Imagen 3 and Veo 2, ensuring that the content we produce is original, safe, and responsible.

Vision AI Filtering: Every user-uploaded image undergoes a Vision API safety check to ensure the system isn't used to process inappropriate content.

Challenges we ran into

Building this in literally a few days meant fighting the clock and the code at the same time. The biggest hurdle was the asynchronous orchestration. We wanted the user to get the perspective text instantly while the heavy media lifting (video/audio generation) happened in the background. We had some intense "container exit(1)" moments with library versioning and ADK arguments in Cloud Run, but we pushed through to get a multimodal "Flip" working by the deadline.

Accomplishments that we're proud of

We went from a blank terminal to a fully functioning AI orchestration engine in 72 hours. Seeing a raw, biased thought transform into a calm, narrated cinematic video that actually makes you feel something different is a massive win. We’re proud of the "humanity" we managed to squeeze into the code in such a short window.

What we learned

We learned that AI tone is everything. A reframed perspective fails if it sounds like a robot. By mapping specific Cloud TTS Studio profiles to emotional lenses, we learned how to make the technology feel like a supportive partner in reflection rather than just a cold data processor.

What's next for The Other Side

It’s not perfect—it’s potential. What we have now is a proof of concept built on adrenaline and a few days of intense coding. If given more time, we would:

Refine the orchestration: Make the transition between the instant text and the background video even more seamless.

Expand the Lenses: Build deeper, more nuanced psychological profiles for the "lenses" to handle complex grief or high-stakes mediation.

Native Veo Integration: Move from our current FFmpeg fallback to 100% native, natively-generated AI cinematography for every shift.

Web-Wide "Flip": Fully realize the Chrome extension so you can point at any polarizing content on the web and see the "Other Side" instantly.

Built With

  • claude
  • cloud-run
  • fastapi
  • ffmpeg
  • gemini
  • gemini-2.5-flash
  • google-adk
  • google-cloud
  • google-cloud-build
  • google-cloud-run
  • google-cloud-tts
  • google-cloud-vision-api
  • google-firestore
  • httpx
  • imagen-3
  • pydantic
  • python
  • slowapi
  • veo-2
  • vertex-ai
Share this project:

Updates