Inspiration
The idea began while scrolling through reels — I saw a post about how images can be interpreted as signals. That sparked a thought: what if I could decode visual information and turn it into music? At first, I imagined directly converting pixel data into sound, but the output wasn’t pleasant. That led me to explore ML models that could extract meaningful emotional and structural features from an image, which could then drive a music generator. That was the foundation of VisionScore.
What it does
VisionScore converts both static images and live camera feed into music using:
- An on-device TFLite model that extracts visual mood, rhythm, and tonal parameters.
- A fully custom DSP-based audio synthesizer built in C++.
- A live mode that generates and evolves music in real time based on camera input.
- A built-in visualizer for immersive UX.
Everything runs offline, low-latency, and optimized for ARM devices.
How we built it
- Trained/converted ML models into TensorFlow Lite for lightweight inference on mobile.
- Used Flutter as the app layer for fast UI development and smooth integration with TFLite.
- Built a full DSP engine in C++, generating stereo audio (melody, bass, percussion, ambience).
- Bridged native C++ and Flutter through JNI + platform channels.
- Implemented AAudio low-latency playback for real-time live mode.
- Added visual real-time monitoring with the built-in visualizer.
Challenges we ran into
- DSP Engine Complexity: Designing polyphonic components (melody, bass, drums, shimmer, arpeggios) in real-time was challenging.
- Memory Crashes: Bridging C++ buffers to Dart initially caused buffer overflows and freezes.
Accomplishments that we're proud of
- Built a fully functional image-to-audio generator from scratch.
- Achieved stable live audio synthesis with real-time camera interaction.
- Created a smooth, minimalistic app UI with benchmark tools, player, and visualizer.
- Turned an idea floating in my mind for months into a working, polished prototype.
What we learned
- Practical on-device ML optimization (TFLite, FP16, inference speed).
- Flutter–C++ integration using JNI.
- A lot about audio DSP, oscillators, envelopes, filters, stereo widening, and real-time synthesis.
- Fundamentals of low-latency audio pipelines (AAudio).
- The power of combining ML + DSP + mobile UX.
- This project also inspired the direction of my final-year project.
What's next for VisionScore
- Improve the DSP engine for richer and more musically expressive output.
- Enhance live mode with more stable smoothing and better transitions.
- Explore sharing, exporting presets, or creating playlists.
- Possibly create a fully generative “AI Audio Lens” using more advanced models.
Log in or sign up for Devpost to join the conversation.