Inspiration We live in a world of static software. If you want to change a dashboard, analyze a new video format, or query a database, you usually have to call an engineer. We wanted to build a "Self-Constructing Interface"—software that adapts to the user, not the other way around. Inspired by the new multimodal capabilities of Gemini 3.0, we asked: "Can we build an app that re-writes its own code based on what it sees and hears?"
What it does Gemini Shape-Shifter is a Multimodal AI Agent that autonomously re-codes its interface to handle any file you drop:
Data Shape-Shifter: Drops a CSV? It writes a Python dashboard with Streamlit and Seaborn to visualize hidden insights instantly.
Vision Engine: Drops a hand-drawn UI sketch? It recognizes the wireframe and writes the frontend code to build it.
Video Intelligence: Drops a video file? Using Gemini’s Long Context Window, it watches the clip frame-by-frame to generate detailed summaries and analysis.
Voice-to-SQL Agent: Speaks a question? It translates natural language into executable SQL queries to answer complex database questions without typing.
How we built it We built the core engine using Python 3.12 and Streamlit for the frontend.
The Brain: We used google-generativeai to connect to Gemini 3.0 Flash Preview.
The Router: The app uses a "Router Pattern" to detect file MIME types (CSV, PNG, MP4, WAV) and route them to the specific AI modality.
The logic: For data visualization, we used an exec() loop that allows the AI to write and execute its own visualization code in real-time.
Deployment: We set up a full CI/CD pipeline connecting GitHub to Streamlit Cloud for instant deployment updates.
Challenges we ran into The biggest challenge was the "Headless Linux" Trap. We initially tried to generate video analysis previews using OpenCV (cv2) on the server. However, the cloud environment lacked the specific video codecs (h264, ffmpeg), causing the app to crash or generate 0-byte files. The Fix: We pivoted to a "Proxy Pattern." Instead of forcing the lightweight server to render heavy video, we built a retrieval system using requests to fetch standardized media for analysis, ensuring 100% stability.
Accomplishments that we're proud of True Multimodality: Successfully integrating Text, Vision, Audio, and Video into a single, unified interface.
The "Nuclear" Fix: Debugging a complex cloud deployment crash in the final hours and implementing a robust workaround.
CI/CD Pipeline: Setting up a professional DevOps workflow where every GitHub push automatically updates the live production app.
What we learned The power of Long Context: We learned that Gemini 3.0 can "watch" video with incredible accuracy, far surpassing traditional frame-sampling methods.
Deployment resilience: We learned that code that works on a Mac M4 doesn't always work on a Linux Cloud server, and how to handle dependency management (specifically opencv-python-headless) to fix it.
What's next for Gemini Shape-Shifter Real-time Database Connection: connecting the SQL Agent to live PostgreSQL databases.
Two-Way Voice: Adding text-to-speech so the Agent can talk back to us.
Self-Correction: Giving the AI the ability to "test" the code it writes and fix its own bugs before showing the result to the user.
Log in or sign up for Devpost to join the conversation.