World Movers AI-Agent: Revolutionizing Logistics with Cloud Run
🤔 Inspiration
In the fast-paced world of logistics, everyone recognized a critical need for faster, more accurate, and versatile customer communication. The inspiration behind the World Movers AI Agent was to create an intelligent assistant that could understand and respond to customer and marketing inquiries across any medium—be it text, voice, documents, or even live screen sharing. The goal was to streamline support, personalize marketing, and seamlessly connect customer needs with actionable services, like generating quotes. By leveraging the power of a serverless, containerized architecture on Google Cloud Run, we aimed to build a highly scalable, cost-effective, and powerful multimodal solution.
✨ What it does
The World Movers AI Assistant is a comprehensive, multimodal support agent designed to enhance customer service and marketing efforts. Deployed as a web application on Google Cloud Run, it offers the following capabilities:
- Intelligent Conversations: Users can ask questions about a wide range of services, including air, sea, and domestic freight, customs, warehousing, and trucking. The assistant can also provide quotes, general support, and personalized marketing information.
- Diverse Content Analysis: The application can analyze various file types like PDFs, DOCX, text files, and images. This allows users to summarize documents, extract information from labels, compare files, and even get feedback on marketing materials.
- Voice-Activated Commands: Seamlessly integrated voice commands are transcribed and processed for effortless customer and marketing interactions.
- Live Feed Monitoring: The AI can analyze a live webcam stream to identify logistics-related objects, text, or scenes, enhancing real-time operational awareness.
- Screen Sharing Interpretation: Both users and agents can share their screens and have the AI analyze the content, which is invaluable for collaborative support and reviewing marketing materials.
- Image Capture and Analysis: Users can take pictures with their device's camera for instant analysis, useful for reporting issues or making visual inquiries about products and services.
- Internal Knowledge Base: The assistant provides accurate information based on World Movers' internal documents, ensuring consistent and reliable customer service.
- Streamlined Quote Generation: When a user requests a quote, the assistant gathers the necessary details and automatically emails the quoting team, accelerating the sales cycle.
- Marketing and Sales Support: The AI can help draft responses to marketing inquiries, provide quick information to sales teams, and summarize service benefits, ensuring consistent messaging.
🛠️ How we built it
The World Movers AI Assistant is built on a modern, serverless architecture, with Google Cloud Run at its core. This provides a fully managed platform that automatically scales our containerized application, ensuring a responsive experience for users.
Core AI and Application:
- Containerization: The application is packaged into a container image, which includes all the necessary code and dependencies. This allows for consistent deployment and scaling on Cloud Run.
- Serverless Deployment: By deploying on Cloud Run, we eliminate the need to manage servers, allowing us to focus on developing new features. Cloud Run's ability to scale to zero means we only pay for the resources we use, making it a cost-effective solution for a hackathon project.
- Multimodal AI Models: The application integrates with powerful multimodal AI models, like Google's Gemini, through API calls. Cloud Run's environment is ideal for hosting the application logic that orchestrates these calls.
- GPU Acceleration: For more intensive AI tasks, Cloud Run's support for GPUs can be leveraged to run inference on our own fine-tuned models, providing high performance for demanding workloads.
Multimodal Input Handling:
- Text and Documents: Standard libraries are used for processing text and parsing various document formats.
- Voice and Media: WebRTC and audio processing libraries handle real-time voice and video streams.
External Knowledge and Actions:
- Retrieval-Augmented Generation (RAG): The application connects to external data sources and internal documents to provide contextually relevant answers. Cloud Run's direct and private connectivity to cloud databases is ideal for RAG implementations.
- Email Integration: An SMTP service is used to send formatted quote requests, demonstrating how AI-driven insights can trigger real-world business processes.
🚧 Challenges we ran into
- Integrating Diverse Libraries: A key challenge was harmonizing the various libraries for handling real-time data streams (like audio and video) with the asynchronous nature of AI model API calls within the Cloud Run environment.
- Efficient Containerization: Creating a lightweight and efficient Docker image was crucial for fast deployments and cold starts on Cloud Run. This involved carefully managing dependencies and optimizing the container build process.
- Multimodal Prompt Engineering: Structuring the prompts to effectively combine text, image, and other data for the multimodal AI model required significant experimentation to achieve accurate and relevant responses.
- Real-time Analysis and State Management: Managing the state of live feeds without overwhelming the application or incurring high costs on Cloud Run required careful architectural planning.
- Securely Managing Credentials: Ensuring secure access to external services and APIs from within the Cloud Run container was a priority.
🎉 Accomplishments that we're proud of
- True Multimodal Integration on a Serverless Platform: We successfully built a truly multimodal AI assistant that seamlessly integrates text, voice, documents, and live media, all deployed on the highly scalable and cost-effective Google Cloud Run.
- Actionable AI with Real-World Impact: The implementation of the email forwarding feature for quotes demonstrates how AI can be connected to tangible business actions, directly improving the efficiency of sales and customer service.
- A Fully Functional End-to-End Workflow: We created a complete workflow from diverse user inputs to sophisticated AI analysis and a concrete business output, showcasing a tangible improvement in customer engagement and operational efficiency.
- Leveraging the Power of Serverless: By building on Cloud Run, we've created a solution that is not only powerful but also incredibly efficient and scalable, making it ideal for a hackathon setting where rapid development and deployment are key.
🧠 What we learned
- The power of serverless for AI applications: Google Cloud Run is an exceptional platform for developing and deploying AI-powered applications, offering a perfect blend of scalability, cost-effectiveness, and ease of use.
- The importance of efficient containerization: A well-optimized Docker container is essential for achieving fast startup times and efficient resource utilization on Cloud Run.
- The nuances of multimodal AI integration: Integrating and orchestrating different data types for a multimodal AI model requires careful prompt engineering and a solid understanding of the model's capabilities.
- The value of a managed platform: Cloud Run's fully managed environment allowed us to focus on the core logic of our application without getting bogged down in infrastructure management.
🚀 What's next for Multimodal All-in-one Global Logistical Solution Cloud Run
- Deeper Integration with Google Cloud Services: We plan to integrate with other Google Cloud services like Vertex AI for more advanced model management and BigQuery for analyzing user interaction data.
- Enhanced AI Agent Capabilities: We will explore using Google's Agent Development Kit (ADK) to build more sophisticated and autonomous AI agents on Cloud Run.
- Proactive Customer Engagement: We aim to develop proactive features, such as notifying customers of potential shipping delays based on real-time data analysis.
- Expanded Language Support: To cater to a global audience, we will add support for multiple languages.
- CI/CD Automation: We plan to set up a full CI/CD pipeline using services like Cloud Build to automate the testing and deployment of our Cloud Run application.
Built With
- agents
- ai
- apis
- applications
- beautiful-soup
- building
- cloud
- common
- customer
- databases
- display
- docker
- fastapi
- firestore
- frameworks
- google-cloud
- google-cloud-run
- google-maps
- google-vertex-ai-(gemini-api)
- grounding
- logistics
- multimodal
- natural-language-toolkit-(nltk)
- pillow
- pydub
- pypdf2
- python
- python-docx
- requests
- required
- run
- search
- service
- services
- smtp
- suggestions
- supported
- technologies
- webrtc




Log in or sign up for Devpost to join the conversation.