AI-Powered Video Summarization and Multilingual Narration

Architecture Diagram
Camunda workFlow
OutputFlowPart1
OutputFlowpart2
OutputFlowPart3
OutputFlowPart4
File uploaded in Google Cloud Storage
Final AudioPut file in Google Cloud Storge

Inspiration

Endless meetings and long tutorial videos eat up hours of my day. I wanted a way to instantly get the key points as audio I can listen to anytime—saving time and boosting productivity.

What it does

I built a system that automatically summarizes long videos, translates the summaries into multiple languages, and generates natural-sounding voice narrations—creating short, impactful, multilingual video content ready to share globally.

How we built it

I integrated multiple Google Cloud Platform APIs to create a fully automated pipeline. Videos are stored in Cloud Storage, transcripts extracted using Cloud Video Intelligence API, and summarized with Vertex AI’s Gemini 2.5 Pro model. The summaries are then translated via Translation API ** and converted into natural-sounding audio with **Text-to-Speech API. All these steps are orchestrated through a Camunda Workflow, from video upload to final output. The backend, built in Java, ensures scalability and reliable execution.

Challenges we ran into

1) Aligning different GCP AI services into a smooth, end-to-end flow. 2) Authenticating Google cloud API with service account and providing the necessary roles to it. 3) Ensuring natural narration in multiple languages 4) Managing performance optimization while keeping costs low. 5) Integrating Google Cloud API's with Camunda service task automation workflow.

Accomplishments that we're proud of

1)Delivered a fully automated, working prototype in record time 2)Seamlessly integrated AI video analysis, translation, and narration 3)Built a scalable workflow that can handle real-world workloads 4)Made content instantly accessible to a global audience

What we learned

1)Best practices for orchestrating multiple cloud AI services. 2) Integrating Google cloud API's using Java SDK's 3)Authentication mechanism (Service account) to call Google Cloud API's via java code. 4)Workflow automation using Camunda in real-world scenarios 5)Optimizing for latency and cost in AI-based video processing 6)The nuances of multilingual text-to-speech synthesis

What's next for AI-Powered Video Summarization and Multilingual Narration

1)Adding animated avatars to deliver summaries visually in different languages 2)Supporting real-time video summarization and translation 3)Deploying as a SaaS platform for enterprises and creators worldwide

Built With

camunda-workflow
cloud-video-intelligent-api
google-client-libraries
google-cloud
google-text-speech-api
google-translation-api
java
maven
vertexai(gemini-model)

Updates

ANIL LALAM started this project — Aug 14, 2025 10:44 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.