Gemini Horizon: Your Bilingual Cloud Pilot Inspiration

Modern cloud platforms like Google Cloud Platform (GCP) and Firebase are incredibly powerful, but navigating them often requires digging through complex menus and extensive documentation. For developers—especially those working in fast-paced environments—this slows down productivity.

We were inspired to build Gemini Horizon to bridge the gap between human intent and cloud execution.

Instead of navigating dozens of menus, developers should simply say:

"Create a new Firebase project" or "افتح إعدادات Cloud Run"

Gemini Horizon acts as a live AI co-pilot that can see the cloud console, understand it, and perform actions on behalf of the developer.

Another key inspiration was empowering the Arabic-speaking developer community, which often lacks AI tools capable of understanding technical Arabic while interacting with English-based cloud interfaces.

What it does

Gemini Horizon is a multimodal AI agent designed for the UI Navigator track.

It allows developers to control cloud infrastructure using natural language in Arabic or English.

The system operates using a See → Think → Act architecture:

See

The agent captures the cloud console interface at 1 frame per second, allowing it to visually understand the layout and available actions.

Think

Using Gemini 3.1 Pro, the system analyzes the interface and determines the optimal navigation path using deep reasoning.

Act

The backend executes the plan through Playwright-based automation, performing clicks, typing, and navigation within the browser.

Bilingual Intelligence

Gemini Horizon seamlessly switches between Arabic and English, preserving technical terminology used in cloud platforms.

Example commands:

"Create a new Cloud Run service"

"أنشئ مشروع Firebase جديد"

How we built it

The project uses a modern production-grade cloud architecture:

Frontend

Flutter Web application

Hosted on Firebase Hosting

Global CDN delivery

Backend

Python automation engine

Docker container running on Google Cloud Run

Playwright for browser control

AI Engine

Gemini 3.1 Pro Preview (March 2026)

Powered using google-genai SDK v1.67.0

Connectivity

Low-latency Raw WebSockets

Streams live JPEG frames

Enables real-time bidirectional communication

Security

Firebase Authentication for identity management

Google Secret Manager to securely store API keys

To ensure accurate UI interaction, we implemented a coordinate transformation layer to convert Gemini's normalized output into browser pixel positions:

𝑥 𝑝 𝑖 𝑥 𝑒

𝑙

𝑥 𝑚 𝑜 𝑑 𝑒 𝑙 1000 × 1440 x pixel ​

= 1000 x model ​

×1440 𝑦 𝑝 𝑖 𝑥 𝑒

𝑙

𝑦 𝑚 𝑜 𝑑 𝑒 𝑙 1000 × 900 y pixel ​

= 1000 y model ​

×900 Challenges we ran into

Building a real-time multimodal agent on serverless infrastructure introduced several technical challenges.

Running Chromium on Cloud Run

Launching a headless browser inside Cloud Run required:

tuning Docker dependencies

increasing memory limits to 2 GiB

to avoid deployment failures and runtime crashes.

WebSocket Protocol Mismatch

Initially we used Socket.io, which created synchronization issues between the Flutter frontend and the Python backend.

We solved this by migrating to a pure Raw WebSocket architecture, which significantly reduced latency and complexity.

The “Site Not Found” Problem

Deploying both frontend and backend created routing issues.

The solution was implementing Firebase Hosting Rewrites, allowing the frontend and Cloud Run backend to operate under a single domain, which also improved:

CORS handling

SEO

deployment simplicity

Accomplishments that we're proud of

Successfully implementing a real-time See-Think-Act loop where the AI reacts dynamically to UI changes.

Building a truly bilingual developer assistant capable of understanding technical Arabic.

Designing a unified cloud architecture where Firebase Hosting acts as a gateway to Cloud Run services.

What we learned

During this project we explored the internal mechanics of Gemini 3.1's Thinking Level system and how Thought Signatures help maintain reasoning consistency across complex multi-step tasks.

We also discovered that in live AI agents, latency becomes the most critical metric. Even small improvements in frame streaming, message serialization, and browser automation can significantly improve the user experience.

What's next for Gemini Horizon

Our roadmap includes expanding Gemini Horizon into a full AI cloud operations platform.

Multi-Agent Collaboration

Specialized agents for:

Security analysis

Billing optimization

DevOps automation

Mobile Vision

Extending the UI Navigator to control Android and iOS cloud management apps.

Proactive Maintenance

Allowing the agent to monitor logs and system metrics, automatically alerting developers via voice or notifications when critical issues occur.

Built With

Share this project:

Updates