Inspiration
Endless meetings and long tutorial videos eat up hours of my day. I wanted a way to instantly get the key points as audio I can listen to anytime—saving time and boosting productivity.
What it does
I built a system that automatically summarizes long videos, translates the summaries into multiple languages, and generates natural-sounding voice narrations—creating short, impactful, multilingual video content ready to share globally.
How we built it
I integrated multiple Google Cloud Platform APIs to create a fully automated pipeline. Videos are stored in Cloud Storage, transcripts extracted using Cloud Video Intelligence API, and summarized with Vertex AI’s Gemini 2.5 Pro model. The summaries are then translated via Translation API ** and converted into natural-sounding audio with **Text-to-Speech API. All these steps are orchestrated through a Camunda Workflow, from video upload to final output. The backend, built in Java, ensures scalability and reliable execution.
Challenges we ran into
1) Aligning different GCP AI services into a smooth, end-to-end flow. 2) Authenticating Google cloud API with service account and providing the necessary roles to it. 3) Ensuring natural narration in multiple languages 4) Managing performance optimization while keeping costs low. 5) Integrating Google Cloud API's with Camunda service task automation workflow.
Accomplishments that we're proud of
1)Delivered a fully automated, working prototype in record time 2)Seamlessly integrated AI video analysis, translation, and narration 3)Built a scalable workflow that can handle real-world workloads 4)Made content instantly accessible to a global audience
What we learned
1)Best practices for orchestrating multiple cloud AI services. 2) Integrating Google cloud API's using Java SDK's 3)Authentication mechanism (Service account) to call Google Cloud API's via java code. 4)Workflow automation using Camunda in real-world scenarios 5)Optimizing for latency and cost in AI-based video processing 6)The nuances of multilingual text-to-speech synthesis
What's next for AI-Powered Video Summarization and Multilingual Narration
1)Adding animated avatars to deliver summaries visually in different languages 2)Supporting real-time video summarization and translation 3)Deploying as a SaaS platform for enterprises and creators worldwide
Built With
- camunda-workflow
- cloud-video-intelligent-api
- google-client-libraries
- google-cloud
- google-text-speech-api
- google-translation-api
- java
- maven
- vertexai(gemini-model)
Log in or sign up for Devpost to join the conversation.