Onama

UI for SoundVibes app hosted by Railway
Image of 2-second segments of a 6 second clip created by our program
image of the ai-generated audio files for each segment created by our program

## Inspiration
Small content creators often don't have access to the same tools as individuals with larger followings. We were inspired to make a tool that makes content creation easier, and more accessible. Adding sound manually to a video is slow, tedious, and requires skills. Our program allows you to upload a silent video, and get a fully sound-designed result in seconds based on the contents in the video. 

## What it does
Onama is an AI-powered “Foley artist” that automatically adds sound to silent videos.

- Upload a short video (e.g., 6 seconds)
- The system breaks it into small segments (of 2 seconds)
- Then, it analyzes and creates a description of what’s happening in each segment
- An appropriate sound is assigned to each segment based on the description from the previous part (e.g., footsteps, ambient noise, impacts)
- The audio for each segment is combined into one file along with the original video
- Outputs a final video with synchronized audio

In simple terms:
It watches your video and gives it an appropriate sound.

## How we built it
We split the tasks that our program needed to complete

upload, segment, classify, map sound, merge, output


- Segmentation: Used MoviePy to split the video into ~2-second clips
- Classification: Used the Gemini API to label each segment with a mood, summary, and sound label
- Sound Mapping: Mapped the sound labels (e.g., “calm”, “tense”) to appropriate sound effects
- Audio Generation: Used ElevenLabs to generate and retrieve sound effects based on the sound labels specified by the sound mapping
- Merging: Recombined video + audio using MoviePy to produce the final output of the original video with sound effects

We also enforced a shared data structure between modules so each part of the system could connect reliably.

## Challenges we ran into
- Integration risk: With multiple people building different parts, mismatched formats could break everything. We solved this by defining strict input/output contracts early on.
- API unpredictability: AI responses weren’t always clean or consistent, so we had to constrain outputs (limited labels, structured JSON).
- Time constraints: With only ~24 hours, we had to prioritize a what was essential to our making our idea work.
- Media processing issues: Handling video/audio reliably (especially encoding and timing) required debugging and simplification (e.g., disabling audio early in the pipeline).
- Wasn't assigning moods to video segments consistently, we solved this by identifying and debugging test code from our program. During testing, we identified that the program was only processing the first two segments of the file, which was much longer. We were only getting the clips and generated audio of two clips. Upon investigation we discovered that there were leftover test cases, which were overwriting the all the new information processed and generated by our program. 

## Accomplishments that we're proud of
- Built a full end-to-end program under tight time constraints
- Successfully integrated video processing + AI analysis of content + audio generation
- Created a tool that solves a problem, has various applications in the entertainment industry, is easily accessible, and can save hours of time during:

Content creation: Automatically adds sound to short-form videos for creators.
Film and animation prototyping: Quickly simulates full audio for storyboards and scenes.
Game development: Can generates audio for video games, making prototyping faster

## What we learned
- Collaborating on Github
- Maintaining versions of each section of the project each team member was working on, so that we could go back to a section to reference code.
- Defining data contracts early is critical for team-based engineering
- Simple systems that work end-to-end beat complex systems that don’t connect
- AI is powerful, but requires constraints and structure to be reliable
- Rapid prototyping forces better prioritization and decision-making

## What's next for Onama
- Smarter scene detection beyond fixed segmentation
- More precise audio synchronization by splitting the input file into even smaller time segments
- Expanded sound library and higher quality generation
- Real time processing
- A polished web interface with drag and drop support

Long term, we see Onama as a tool that can improve content creation workflows, helping anyone from animators to content creators save hours of time.