Gemini Horizon: Your Bilingual Cloud Pilot Inspiration
Modern cloud platforms like Google Cloud Platform (GCP) and Firebase are incredibly powerful, but navigating them often requires digging through complex menus and extensive documentation. For developers—especially those working in fast-paced environments—this slows down productivity.
We were inspired to build Gemini Horizon to bridge the gap between human intent and cloud execution.
Instead of navigating dozens of menus, developers should simply say:
"Create a new Firebase project" or "افتح إعدادات Cloud Run"
Gemini Horizon acts as a live AI co-pilot that can see the cloud console, understand it, and perform actions on behalf of the developer.
Another key inspiration was empowering the Arabic-speaking developer community, which often lacks AI tools capable of understanding technical Arabic while interacting with English-based cloud interfaces.
What it does
Gemini Horizon is a multimodal AI agent designed for the UI Navigator track.
It allows developers to control cloud infrastructure using natural language in Arabic or English.
The system operates using a See → Think → Act architecture:
See
The agent captures the cloud console interface at 1 frame per second, allowing it to visually understand the layout and available actions.
Think
Using Gemini 3.1 Pro, the system analyzes the interface and determines the optimal navigation path using deep reasoning.
Act
The backend executes the plan through Playwright-based automation, performing clicks, typing, and navigation within the browser.
Bilingual Intelligence
Gemini Horizon seamlessly switches between Arabic and English, preserving technical terminology used in cloud platforms.
Example commands:
"Create a new Cloud Run service"
"أنشئ مشروع Firebase جديد"
How we built it
The project uses a modern production-grade cloud architecture:
Frontend
Flutter Web application
Hosted on Firebase Hosting
Global CDN delivery
Backend
Python automation engine
Docker container running on Google Cloud Run
Playwright for browser control
AI Engine
Gemini 3.1 Pro Preview (March 2026)
Powered using google-genai SDK v1.67.0
Connectivity
Low-latency Raw WebSockets
Streams live JPEG frames
Enables real-time bidirectional communication
Security
Firebase Authentication for identity management
Google Secret Manager to securely store API keys
To ensure accurate UI interaction, we implemented a coordinate transformation layer to convert Gemini's normalized output into browser pixel positions:
𝑥 𝑝 𝑖 𝑥 𝑒
𝑙
𝑥 𝑚 𝑜 𝑑 𝑒 𝑙 1000 × 1440 x pixel
= 1000 x model
×1440 𝑦 𝑝 𝑖 𝑥 𝑒
𝑙
𝑦 𝑚 𝑜 𝑑 𝑒 𝑙 1000 × 900 y pixel
= 1000 y model
×900 Challenges we ran into
Building a real-time multimodal agent on serverless infrastructure introduced several technical challenges.
Running Chromium on Cloud Run
Launching a headless browser inside Cloud Run required:
tuning Docker dependencies
increasing memory limits to 2 GiB
to avoid deployment failures and runtime crashes.
WebSocket Protocol Mismatch
Initially we used Socket.io, which created synchronization issues between the Flutter frontend and the Python backend.
We solved this by migrating to a pure Raw WebSocket architecture, which significantly reduced latency and complexity.
The “Site Not Found” Problem
Deploying both frontend and backend created routing issues.
The solution was implementing Firebase Hosting Rewrites, allowing the frontend and Cloud Run backend to operate under a single domain, which also improved:
CORS handling
SEO
deployment simplicity
Accomplishments that we're proud of
Successfully implementing a real-time See-Think-Act loop where the AI reacts dynamically to UI changes.
Building a truly bilingual developer assistant capable of understanding technical Arabic.
Designing a unified cloud architecture where Firebase Hosting acts as a gateway to Cloud Run services.
What we learned
During this project we explored the internal mechanics of Gemini 3.1's Thinking Level system and how Thought Signatures help maintain reasoning consistency across complex multi-step tasks.
We also discovered that in live AI agents, latency becomes the most critical metric. Even small improvements in frame streaming, message serialization, and browser automation can significantly improve the user experience.
What's next for Gemini Horizon
Our roadmap includes expanding Gemini Horizon into a full AI cloud operations platform.
Multi-Agent Collaboration
Specialized agents for:
Security analysis
Billing optimization
DevOps automation
Mobile Vision
Extending the UI Navigator to control Android and iOS cloud management apps.
Proactive Maintenance
Allowing the agent to monitor logs and system metrics, automatically alerting developers via voice or notifications when critical issues occur.
Log in or sign up for Devpost to join the conversation.