Inspiration
Edge AI models provided by Stability and powered by Arm! I'm an evangelist for edge AI since it doesn't consume massive amounts of datacenter power that causes humongous carbon emissions while providing the power of AI to the remotest of places on earth just with even a lower end smartphone. How lovely!
I wanted to mostly tinker around with Stability's Stable Open Audio and challenge myself to whether I can vibe code the entire stuff from scratch while doing it in public (on X.com). My post link - link
What it does
The app allows the user to upload/capture a video and then generate a background music via a prompt input box. The app provides 2 basic video editing feature as well - trimming and text overlay. Furthermore the app automatically syncs up the video with the audio and provides users with some effect options such as fade-in and fade-out. Finally the user can directly share the video to social media platforms from within the app or download it to their local folder.
In order to make the prompting strategy right the app provides users with tools to simply select stuff like genre, instruments etc and be not bothered about the exact prompt.
How I built it
I used Android Studio to do the development with the help of in-built Gemini coding agent. Had to also rely on Claude and my software engineering skills when Gemini was not able to resolve certain issues.
I forked this open source project called LibreCuts that provides android users with video editing tools. But there were problems with the app, so I made these changes to rectify them -
1) Removed the dependence on open-source ffmpeg-kit because the library is no longer supported by the author and is not available in Maven. Building it from scratch proved to be a pain in the a**. So I vibe-coded some features offered by this library and directly integrated that into the app source code to enable things such as video editing and text overlay,
2) I restructured the code base to follow the neat MVVM platform to ensure separation of concerns and code readability as the original code base didn't have it.
3) I integrated the audio generation feature into the app and gave it my own flavor in terms of color themes and splash screen.
Challenges I ran into
The biggest challenge I ran into was during the integration of SentencePiece text tokenizer. I didn't know that it involved quite a lot of steps. First the model's cpp code had to be adapted to Kotlin and then JNI bindings had to be made to finally make it work with the app. This took I think 2 days of debugging, but I learnt the concept of creating java bridges to make cpp code work with Java/Kotlin.
Accomplishments that I'm proud of
A fully functioning app that is now available on Github.
What I learned
1) Distilling bigger PyTorch models into smaller efficient tflite ones using frameworks like ExecuTorch. 2) Concept of Quantization. 3) How to vibe code in the right way (i.e. mindful vibecoding :))
What's next for VibeAudio
A few features that I'm gonna test out with my users - 1) Multilingual support via voice to English prompt. 2) Offer Stability's paid model for vocals in music, and for longer duration. 3) Integrating more video and audio editing features. 4) Video generation. 5) Help with prompting where users can just select options such as genre, instruments etc and the app will create the perfect prompt for the model.
Built With
- android-studio
- claude
- gemini
- github
- kotlin

Log in or sign up for Devpost to join the conversation.