UMKM-GO-AI

Inspiration

Indonesia's vibrant economy is powered by millions of Small and Medium Enterprises (UMKM), yet many entrepreneurs struggle, acting as solo operators juggling countless roles. They often lack access to specialized knowledge in critical areas like legal compliance, digital marketing, branding, and operational analysis due to cost and time constraints. Witnessing this gap, and inspired by the AI Accelerate theme of unlocking new frontiers with Google Cloud and its partners, we envisioned an AI-powered solution. We wanted to build more than just a tool; we aimed to create a virtual, proactive partner that could democratize expert knowledge and actively help these businesses thrive in the digital age, leveraging the powerful combination of Google's generative AI and Elastic's Search AI platform. The potential to make a tangible impact on this vital economic sector was our core motivation.

What it does

UMKM-Go AI is a proactive, multi-agent AI business partner delivered through an intuitive mobile and web application (built with Flutter). It acts as a virtual team of consultants, allowing users to ask natural language questions about their business and receive intelligent, context-aware, and actionable advice.

Key functionalities include:

** Legal Agent:** Answers complex questions about Indonesian business laws, regulations, permits (NIB, PIRT), taxes, etc., using Retrieval-Augmented Generation (RAG) powered by Elasticsearch Hybrid Search on a custom-built legal knowledge base.
** Marketing Agent:** Provides creative marketing ideas, social media content strategies, and branding tips, grounded in relevant marketing knowledge retrieved using Elasticsearch KNN Search from a curated article database.
** Instant Brand Agent (🌟):** Generates a complete brand identity kit (names, taglines, logo concepts, Instagram bio) from a single product photo. This sophisticated multimodal pipeline uses Google Cloud's Gemini 2.5 Pro for image analysis and creative generation, Vertex AI Multimodal Embeddings for image vectorization, Elasticsearch KNN Search for visual inspiration retrieval, Imagen on Vertex AI for logo image synthesis, and Google Cloud Storage for hosting results.
** Intelligent Orchestrator:** A central agent using Gemini 2.5 Pro to understand the user's query intent and automatically route it to the appropriate specialist agent (Legal, Marketing), creating a seamless user experience.
** Proactive Agent (🌟):** Works autonomously! Triggered daily by Cloud Scheduler, this agent scans external data sources (configured for RSS feeds) for relevant business opportunities (e.g., grants, export programs, relevant news) and notifies the user via Firebase Cloud Messaging (FCM) push notifications. NOTE: Due to the time constraints of the hackathon, the current deployment utilizes a hardcoded FCM device token within the backend (agent_proactive.py) for demonstration purposes. As a result, push notifications triggered by Cloud Scheduler will only be sent to the pre-configured device during judging.
** Operational Agent:** Analyzes simple sales data from user-uploaded CSV files using Pandas and leverages Gemini 2.5 Pro to translate raw statistics into actionable business insights.

How we built it

We adopted a modular architecture with a Flutter frontend and a Python (FastAPI) backend deployed on Google Cloud Run, leveraging a wide array of sponsor technologies:

Google Cloud Platform:

Vertex AI: The core AI engine.
- Gemini 2.5 Pro: Used extensively for:
  - RAG answer generation (Legal & Marketing Agents).
  - Intent classification (Orchestrator).
  - Creative text generation (Brand Agent - names, taglines, logo descriptions, bio).
  - Multimodal image analysis (Brand Agent - extracting labels/colors from product photos).
  - Interpreting statistical data into insights (Operational Agent).
- Multimodal Embeddings API (multimodalembedding@001): Crucial for the Brand Agent, generating vector representations for both the user's product image and the visual inspiration images stored in Elasticsearch.
- Imagen API (imagegeneration@005): Employed by the Brand Agent to synthesize visual logo concepts from textual descriptions generated by Gemini.
Cloud Run: Provides a scalable, serverless environment for our containerized FastAPI backend, handling API requests efficiently.
Cloud Scheduler: Orchestrates the Proactive Agent, reliably triggering the daily opportunity scan via authenticated HTTP calls (OIDC tokens).
Firebase Cloud Messaging (FCM): Enables real-time, proactive push notifications to the user's device (web & mobile) via FCM Topics for broad reach during demo/testing.
Secret Manager: Securely stores sensitive credentials like the Elastic API Key, accessed safely by the Cloud Run service account.
Cloud Storage (GCS): Used to store and serve the logo images generated by Imagen, configured for public access.
Artifact Registry: Stores the Docker container image built by Cloud Build.
Cloud Build: Automates the CI/CD pipeline, building the Docker image (including embedding models) and deploying it seamlessly to Cloud Run using Infrastructure-as-Code principles (cloudbuild.yaml).
IAM: Configured permissions meticulously for service accounts (Cloud Build accessing Cloud Run & Secrets, Cloud Run accessing Vertex AI & GCS & Secrets, Cloud Scheduler invoking Cloud Run) following the principle of least privilege where possible.

Elastic:

Elasticsearch (on Elastic Cloud): The heart of our knowledge retrieval system.
- Vector Database: Stores and enables similarity search for both:
  - Text Embeddings: Generated using SentenceTransformers for legal documents (PDFs parsed with PyMuPDF) and marketing articles.
  - Multimodal Embeddings: Generated using Vertex AI for the visual inspiration images (logos, packaging, palettes).
- Hybrid Search: Leveraged in the Legal Agent, combining traditional keyword search (BM25) with semantic vector search for superior relevance when retrieving legal clauses.
- KNN Search: Used for efficient vector similarity search in the Marketing Agent (finding relevant articles) and the Brand Agent (finding visually/semantically similar inspirations). We utilized KNN filtering based on tags/labels for more targeted results.
Elastic Cloud: Provided a fully managed, scalable, and reliable Elasticsearch cluster integrated with GCP.
Kibana: Essential during development for creating index mappings, verifying data ingestion (legal text, marketing articles, visual KB), and testing search queries.

Other Key Technologies:

Backend: Python 3.11, FastAPI (for robust API development), Pandas (for Operational Agent's data analysis).
Frontend: Flutter (for cross-platform mobile & web UI), BLoC (for scalable state management).
Data Processing: Python, Selenium & BeautifulSoup4 (for robust web scraping of dynamic/unreliable sources), PyMuPDF (PDF text extraction), Requests (RSS feed fetching), SentenceTransformers (local text embedding generation).
CI/CD: Docker, Cloud Build (cloudbuild.yaml).

Challenges we ran into

Building a sophisticated, multi-agent AI system like UMKM-Go AI within the hackathon timeframe presented several significant, yet rewarding, challenges:

Building Diverse and High-Quality Knowledge Bases: Acquiring and processing varied data sources was a substantial undertaking.
- Legal KB: This required robustly scraping official Indonesian government portals, tackling challenges like dynamically loaded content and extracting clean text from PDF documents using PyMuPDF and Selenium. We then implemented a structured chunking process (by Chapter and Article) to prepare the data for Elasticsearch.
- Visual KB: For the "Instant Brand Agent," we manually curated a focused dataset of 130+ inspiration images (logos, packaging, palettes relevant to the food industry). We then built an automated pipeline using Gemini 2.5 Pro's multimodal capabilities for generating descriptive tags for each image, followed by generating Vertex AI Multimodal Embeddings and indexing everything into a dedicated Elasticsearch index.
Orchestrating a Complex Multi-Agent AI System: Integrating multiple specialized AI services (Gemini for text/vision/routing, Vertex Embeddings, Imagen) and data stores (Elasticsearch for text/vector/image metadata) within a coherent multi-agent architecture required careful design using FastAPI's APIRouter, robust error handling across API calls, and ensuring efficient data flow management. The Intelligent Orchestrator itself, which classifies user intent using Gemini, was a key piece requiring careful prompt engineering.
Mastering Advanced Sponsor Technologies: Implementing cutting-edge features demanded a rapid learning curve:
- Elasticsearch: Moving beyond basic text search to effectively implement Hybrid Search and KNN Vector Search (including multimodal queries with filtering) required understanding index mapping optimizations (dense_vector, keyword) and query DSL nuances.
- Vertex AI: Integrating the spectrum of services – from generative text and image models (Gemini, Imagen) to specialized APIs (Multimodal Embeddings) – necessitated careful handling of different client libraries, input/output formats (like base64 for images), and ensuring proper IAM permissions were configured.
Robust Cloud Deployment & Configuration: Deploying a multi-model AI application with significant resource requirements (especially embedding models) to Cloud Run presented hurdles. We overcame "silent" container startup failures through systematic debugging (using test Dockerfiles, analyzing logs), meticulous Dockerfile optimization (including baking in models), Infrastructure-as-Code practices (cloudbuild.yaml defining build and push steps with env vars/secrets), and precise IAM configuration across Cloud Build, Cloud Run, Secret Manager, GCS, and Vertex AI.
Ensuring Cross-Platform Consistency: Making the application work seamlessly on both Flutter Web and Android required addressing platform-specific challenges like CORS policy configuration on the backend (FastAPI CORSMiddleware) and adapting file upload logic (MultipartFile handling) in the Flutter frontend.

Accomplishments that we're proud of

Successfully building a functional end-to-end multi-agent AI system integrating multiple advanced services within the hackathon timeframe.
Implementing a sophisticated multimodal pipeline for the "Instant Brand Agent," showcasing Gemini Vision, Vertex AI Multimodal Embeddings, Elasticsearch KNN, Imagen, and GCS working in concert.
Creating a truly proactive feature using Cloud Scheduler and FCM, demonstrating an architecture beyond simple request-response.
Leveraging Elasticsearch not just as a text search engine but as a powerful Vector Database supporting both Hybrid Search and KNN Search for RAG across text and image data.
Overcoming significant deployment challenges to get the complex backend running reliably on Cloud Run.
Building a clean, maintainable codebase using FastAPI (backend) and Flutter with BLoC (frontend).

What we learned

The Power of Multimodal AI: Integrating image understanding (Gemini/Vision), image generation (Imagen), and multimodal search (Elastic + Vertex Embeddings) opens up incredibly creative and valuable use cases.
Pragmatism in Hackathons: While aiming high, adapting to challenges (like unstable data sources) and choosing reliable alternatives (RSS feeds) is key to delivering a functional demo. Prioritizing core features over optional extras (like Elastic APM) was crucial.
Infrastructure as Code: Defining deployment configurations (environment variables, secrets) within cloudbuild.yaml is vital for reproducible and reliable Cloud Run deployments.
Sponsor Technology Synergy: We gained deep appreciation for how seamlessly Google Cloud's AI services can be combined with Elastic's Search AI capabilities to build powerful, grounded generative AI applications.
The Importance of Iteration: From data scraping to prompt engineering to deployment debugging, iterative refinement was essential at every stage.

What's next for UMKM-GO-AI

UMKM-Go AI has immense potential for further development:

Implement Human-in-the-Loop (HITL) Validation for Legal Data: To ensure the utmost accuracy and maintain user trust, especially when regulations change, we plan to build a workflow where newly scraped legal documents (identified by the Proactive Agent) are placed in a staging area. An administrator would review and approve the processed content (chunks, metadata) via a simple dashboard before it is indexed into the main Elasticsearch knowledge base and used by the Legal Agent.
Expand Knowledge Bases: Ingest more diverse legal documents (local regulations, specific industry permits), broader marketing resources (case studies, platform guides), and a wider range of visual inspirations to enhance all agent capabilities.
Robust Notification System: Implement Firestore to properly manage user profiles and FCM tokens.
New Agents: Develop additional specialist agents (e.g., a Finance Agent using Pandas for bookkeeping advice, an HR Agent leveraging the legal KB for basic employment queries).
Enhanced Proactive Agent: Expand beyond RSS feeds to revisit scraping government tender portals (using more advanced techniques) and add other opportunity sources (e.g., funding announcements, relevant local events).
UI/UX Improvements: Refine the Flutter UI based on user feedback, add user profiles for personalization (e.g., storing business type), and potentially integrate voice commands (Google Cloud Speech-to-Text).
Deeper Elastic Integration: Explore Elastic APM for performance monitoring and potentially Elastic's built-in ML features (like anomaly detection on sales data).
Personalization: Allow users to specify their business type and goals to receive even more tailored advice and opportunities.

Built With

docker
elasticsearch
fastapi
firebase-cloud-messaging
flutter
google-cloud
google-cloud-run
google-cloud-scheduler
google-cloud-vertex-ai
selenium
sentencetransformers

Updates

suparman Donthave started this project — Oct 24, 2025 04:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.