NEXUS AI Agent is a real-time multimodal creative assistant built with Gemini and deployed on Google Cloud that allows users to brainstorm, refine, and generate complete social media content packs through natural conversation.

NEXUS AI Agent is designed for creators, student leaders, clubs, marketers, and small teams who need a faster and more natural way to create social media content. Instead of switching between multiple tools for brainstorming, prompt writing, visual references, generation, and exports, users can speak, type, upload images, capture camera references, refine ideas in real time, and generate a complete content pack inside one application.

Live Demo

Try the deployed application:
https://nexus-ai-agent-700973101241.us-central1.run.app

Source Code:
https://github.com/antonymwangidev-hub/nexus-ai-agent

The core problem this project solves is workflow fragmentation in content creation. In a typical process, creators brainstorm ideas in chat tools, manually convert them into prompts, search for reference images in separate tools, generate visuals elsewhere, and then manually assemble captions and hashtags. This fragmented workflow slows down creators and makes high-quality content production difficult, especially for non-experts.

NEXUS AI Agent replaces that process with a live creative assistant experience. Users can start a live session, brainstorm campaign ideas with the agent, refine tone and audience in conversation, add visual references through image upload or live camera capture, and then instruct the agent to write the final production-ready prompt directly into the Generate Content Pack section.

The live agent can also trigger generation automatically, making the experience feel less like filling out forms and more like collaborating with an intelligent assistant.

This project was built to demonstrate a new paradigm for AI interaction: live, multimodal agents that collaborate with users while performing tasks in the background.

What makes the project unique is its agentic live-to-action workflow. The live agent is not just a chatbot. It actively helps users think, refine ideas, and execute actions inside the interface.

Users can talk to the system naturally and say commands such as:

“Give me three content ideas for an event promotion.”

“Refine the second idea for a younger audience.”

“Write the final prompt in the Generate Content Pack section using everything we discussed.”

“Now generate the content.”

When these instructions are given, the application executes them while maintaining the natural conversation experience.

NEXUS AI Agent generates a complete social content pack that can include:

• social media captions
• hashtags
• image prompts
• generated visuals
• structured campaign notes
• downloadable outputs

The system also stores generated results in a persistent history so users can reopen, review, and export previous campaigns.

To create a natural interaction experience, the application supports:

• real-time conversational interaction with the live agent
• continuous speech-to-text input
• spoken agent replies
• image upload for visual context
• live camera capture for reference images
• multimodal prompting combining text and images

These features allow the AI agent to understand both conversational and visual context while helping users move from idea to finished content.

Technical Implementation

The project satisfies all challenge technical requirements. It uses Gemini models, is built using the Google GenAI SDK, and is deployed on Google Cloud infrastructure.

The frontend is implemented with HTML, CSS, and JavaScript, providing a responsive and intuitive web interface.

The backend is built using FastAPI and deployed on Google Cloud Run, allowing the application to scale automatically while remaining easy to deploy and maintain.

Gemini models accessed through Google Vertex AI power the agent’s reasoning, conversational understanding, and content generation capabilities.

Cloud Storage is used to store uploaded reference images and generated visual assets.

This architecture creates a scalable real-time pipeline where user interactions are processed instantly while maintaining conversational context.

Architecture Flow

User Input (Voice, Text, Image, or Camera)

Frontend Interface

Live Session Controller

Gemini Model via Vertex AI

Action Execution + Content Generation

Real-Time Feedback to User

This design allows the AI agent to interpret requests immediately while maintaining a natural conversation flow.

Challenges

One of the main challenges was designing an agent that could perform actions without exposing technical instructions to the user. For example, the system needed to trigger generation or write prompts without displaying internal command structures.

This was solved by instructing the agent to always respond naturally to users while executing structured actions silently in the background.

Another challenge involved maintaining stable live sessions while supporting voice input, image references, and real-time generation triggers. Careful session management and testing across browsers were required to ensure reliability.

Accomplishments

• Built a fully functional live multimodal AI agent
• Integrated Gemini models for conversational intelligence
• Implemented real-time conversational workflows
• Designed a cloud-based scalable architecture
• Deployed a production-ready system on Google Cloud

What We Learned

This project reinforced an important insight about the future of AI systems: the most powerful AI experiences will not be simple question-and-answer tools.

Instead, they will be intelligent agents that collaborate with humans to accomplish real tasks.

NEXUS AI Agent demonstrates how Gemini can move beyond static chat interfaces and become a real-time creative partner that helps users move from ideas to finished content.

Tech Stack

AI Model
• Gemini (via Vertex AI)

Backend
• Python
• FastAPI

Frontend
• HTML
• CSS
• JavaScript

Google Cloud Services
• Google Cloud Run (deployment)
• Vertex AI (Gemini models)
• Cloud Storage (reference image storage)

AI Integration
• Google GenAI SDK
• Gemini Live interaction patterns

Share this project:

Updates