Crevo

Real time transcription and translation
Auto Framing Feature
Mouth movement detection real time
Option to download video & subtitles (.srt) after live recording
Crevo: cover
Home Page
Upload files page
Video generated page

Inspiration

In today’s fast-paced digital world, content creation is more crucial than ever, yet video editing remains a time-consuming and complex process. Creators spend countless hours manually editing video recording, aligning subtitles, and ensuring seamless transitions.

Our solution to this is (Crevo) — an AI-powered software that automatically edit video. With real-time facial and voice recognition, Crevo automates the entire editing process, identifying speakers, detecting mouth movements, and generating accurate captions. Whether live or from an uploaded file, Crevo crop, cut the video based on detected speakers in video, ensures that polished videos are edited without requiring hours of manual work.

Imagine recording a podcast discussion where Crevo automatically detects and highlights each speaker, adds synchronized subtitles, and delivers a ready-to-publish video—all in real-time. For content creators, journalists, educators and businesses, Crevo eliminates tedious post-production, allowing them to focus on storytelling instead of technicalities.

With Crevo, video editing becomes effortless, inclusive, and accessible to all. The future of content creation isn’t just automated—it’s intelligent.

What it does

Crevo is an AI-powered video editing software that automates the entire editing process using advanced facial and voice recognition. Whether in real-time or from uploaded footage, Crevo detects who is speaking at the moment, crop the video based on the facial position and generates accurate captions and subtitles instantly.

For live recording option, Crevo focuses the aim to person is speaking in the video and transcribe into realtime text inside the video. It also has a feature to translate the language into another language real time. The edited video with the subtitles file can be downloaded.

For uploading option, the video are going to be cropped in right moment based on voice and facial detection of speakers. It would create a completed Youtube style single mp4 file.

How we built it

The backend server is powered by Flask, integrating machine learning code (using mediapipe, pyannotate and multiple ML models) to process facial, voice recognition, transcribing and translating. For the frontend, we utilized Next.js to construct a responsive and efficient user interface, while Figma was used for the design and UX. (Note: We used an AI to help write some code in front end and backend. The UX/UI designed solely by manpower)

The Autoframing function was achieved through custom face detection algorithms, including mouth movement detection using MediaPipe, combined with voice recognition powered by PyAnnotate.

For real-time subtitle generation, we used Vosk and KaldiRecognizer language models, enabling translation and accurate text display in real-time. Together, these technologies allow Crevo to deliver an advanced, user-friendly solution for automated video editing.

Challenges we ran into

One of the biggest challenges we faced was improving accuracy of speaker detection in auto framing feature. We initially tried by designing algorithms that detects mouth movement but there were so many times that mistakenly detected as speaking though person was just only moving. We solved this issue by voice recognition algorithm and face detection algorithm. The auto framing algorithm for post processing option (uploading file option), we designed the algorithm to label speaker based on voice features with calculating face position to crop the video. One for realtime, detect mouth movement with face position of speaker to reduce mis-detecting by determine whether just moving face or speaking based on information combining both data.

Accomplishments that we're proud of

We’re incredibly proud of how Crevo has come together, particularly the seamless integration of AI and real-time video editing. One of our biggest accomplishments is the Autoframe function, which automatically focuses on the speaker and dynamically adjusts the video frame. This was made possible through our custom face detection algorithms and mouth movement tracking, combined with voice recognition technology. It’s a feature that makes the viewing experience feel natural and professional.

We’re also proud of the real-time subtitle generation, powered by Vosk and KaldiRecognizer. Being able to translate and display accurate captions in multiple languages on the fly is a game-changer for accessibility.

Lastly, despite the technical challenges, we successfully built a user-friendly interface using Next.js and Figma. We’re confident that Crevo will make video editing easier for creators, businesses, and educators, and we’re excited about the impact it will have on the content creation world.

What we learned

Building Crevo has been a huge learning experience for us. We knew AI-powered video editing would be challenging, but we didn’t fully realize just how complex facial and voice recognition could be until we started developing it. Getting the AI to accurately detect speakers and sync captions in real-time wasn’t as simple as we first thought.

We also learned that making something work isn’t enough—it has to be easy to use. No matter how powerful the tech is, if people struggle with it, they won’t use it. That’s why refining the UX has been just as important as building the AI itself.

What's next for Crevo

Our next step is to refine Crevo’s system and user experience before bringing it to market. We want to ensure that creators, educators, and businesses get a seamless, intuitive, and highly efficient tool for automated video editing.

To achieve this, we’ll focus on optimizing Crevo’s AI capabilities, enhancing speaker detection accuracy, improving subtitle precision, and fine-tuning real-time editing performance. At the same time, we’ll polish the UX/UI to make the platform as user-friendly as possible. Once the system is fully optimized, we’ll launch an early access program, allowing creators to test Crevo, provide feedback, and help shape its final version.

From there, our goal is to expand Crevo’s reach by partnering with content creators, educators, and media professionals. We’ll also explore opportunities to scale the product and bring it to a wider audience. With a strong foundation in place, we want to make Crevo a game-changer in video editing.

Built With

figma
flask
mediapipe
moviepy
next.js
pyannotate
python
typescript

Submitted to

UNIHACK 2025
- Winner First Timer's Prize

Created by

I worked on the UX/UI design and the front-end code.

tugsuu YZ
I worked on frontend, backend (especially endpoints and file management) and part of ML algorithms for speaker detection in upload option

Rin Ohsugi
I worked on live speech transcription and translation + frontend integration with backend and UI

Audrey Santoso
solgrace Solibun
liam bradford