ServiceFlow: AI-Powered Hospitality Training

Inspiration

The hospitality industry runs on verbal tradition and paper notes. "Shadowing" is the primary training method, but it often leads to knowledge gaps—one server teaches the menu differently than another. The goal was to digitize this tribal knowledge. The question was: What if a manager could just speak a standard, and an AI would instantly turn that voice note into a visual, audible, and structured training pack?

What it does

ServiceFlow is an agentic content generation platform. It takes unstructured inputs—a manager's voice recording, a messy text note, or a rough napkin sketch—and orchestrates a team of AI agents to convert them into professional training deliverables:

  • Sequence Cards: Step-by-step visual SOPs.
  • Audio Scripts: Perfected voice-overs for staff to listen to during pre-shift.
  • Layout Diagrams: Code-generated floor plans and station maps.
  • Decision Flowcharts: Logic trees for handling guest complaints.

How it was built

ServiceFlow is a React application built with TypeScript and Tailwind CSS. The core intelligence is powered by the Google GenAI SDK.

The architecture uses a "Hub and Spoke" agent model:

  1. The Orchestrator: Gemini 3 Pro is used to analyze the user's raw input, determine the intent, and plan the structure of the training module.
  2. The Sub-Agents: The Orchestrator delegates tasks to specialized models:
    • Visual Coder: Uses Gemini 3 Flash to write HTML/Mermaid.js code for diagrams.
    • Photographer: Uses Gemini 2.5 Flash Image (Nano Banana) to generate scene illustrations.
    • Voice Actor: Uses Gemini 2.5 Native Audio to synthesize human-like speech.

Powered by Gemini 3

ServiceFlow was designed to leverage the capabilities of the Gemini 3 series.

1. In the App (Runtime)

  • Reasoning & Planning: The Orchestrator relies on Gemini 3 Pro's superior reasoning to break down a vague instruction like "Train them on wine service" into a precise 5-step JSON object with visual cues for every step.
  • Speed & Efficiency: Gemini 3 Flash is used for the "Sophie" chat assistant and for generating SVG/HTML code, ensuring the UI remains snappy.
  • Multimodal Generation: The app seamlessly switches between generating text, code, audio, and images in a single workflow, controlled by the Gemini 3 context window.

2. In the Studio (Development)

This entire application was developed with the assistance of Gemini 3 Pro via Google AI Studio

Challenges

  • Orchestration Latency: Generating text, images, and audio simultaneously can take time. This was solved by implementing a granular progress tracking system that updates the user on exactly which agent is working (e.g., "Synthesizing audio assets...").
  • Diagram Consistency: Generative image models struggle with precise text on diagrams. This was solved by prompting Gemini 3 Flash to write code (Mermaid.js) instead of pixels, resulting in crisp, editable flowcharts.
  • Browser Audio Support: Handling raw PCM audio streams from the Native Audio model required specific decoding logic to play back smoothly in the browser.

Accomplishments

  • The "Visuals as Code" Feature: Successfully getting an LLM to generate renderable HTML diagrams on the fly.
  • Multi-Modal Pipeline: A single "Generate" button triggers three different AI models working in harmony.
  • The UI/UX: The app feels professional, with glassmorphism effects, smooth transitions, and a "Command Center" aesthetic.

What's next for ServiceFlow

  • Fix Known Bugs: Library and activity items can't be deleted for now, some time images generated get the concept but not the intent (this is usually in cases where intent has many valid interpretations and generations sometime output storage errors related to local storage (so far only on the cloud run hosted app)
  • Training Packs: They're not active at the moment.
  • Update Output formats: Currently the content asset type generated for each module is similar, the update would make it more adapted to each flow's need and make feature like the layout diagram modules better
  • Live training Mode: A live, real-time voice conversation where Gemini acts as a difficult guest and the server practices their response.

Built With

  • aistudio
  • gemini-2.5-flash-image
  • gemini-2.5-native-audio
  • gemini-3-flash
  • gemini-3-pro
  • google-genai-sdk
  • mermaid.js
  • react
  • tailwindcss
  • typescript
Share this project:

Updates