EchoMood

Inspiration

The idea was sparked by two trends:

Content creators increasingly need unique audio to differentiate their posts.
Artists are suing platforms for unauthorized use of their music, highlighting the need for copyright-safe alternatives.

EchoMood bridges this gap by offering AI-generated audio that is personalized, affordable, and device-native.

Running generative models on-device requires careful quantization and optimization.
Lightweight LLMs (like Qwen 2.5B) can effectively reason about mood and context when paired with domain-specific audio models.
Social media creators value speed and independence from cloud services, which shaped our design choices.

Frontend: React Native for cross-platform UI, with Kotlin modules for Android integration.
Backend Ops: Python scripts handled LLM reasoning and prompt generation.
Audio Generation: Stability Audio consumed structured prompts to produce music.
Deployment: Models were quantized and optimized on AWS EC2 G-series instances, then packaged into an APK for Android devices.
Workflow:
1. User enters a text description of their post.
2. LLM interprets mood and generates 3 concise prompts.
3. Prompts are fed into Stability Audio.
4. Audio is generated locally and returned instantly.

Model Quantization: Balancing performance and accuracy while fitting models into mobile hardware constraints.
On-device Execution: Ensuring smooth performance without overheating or excessive battery drain.
Prompt Engineering: Designing prompts that consistently yield high-quality audio across diverse moods.
Integration: Bridging multiple languages (Python, Kotlin, TypeScript, JavaScript) into a seamless pipeline.

Move beyond text prompts to support images, videos, and speech as inputs.
Enable multimodal generation where EchoMood analyzes visual or spoken cues to craft matching audio.

Introduce genre blending and advanced audio layering for more complex tracks.
Offer adaptive music that can sync with video pacing (e.g., beat drops aligned with transitions).