Inspiration

This app was born from a simple belief: art is for everyone. After reflecting on what art truly means, we realized that art isn’t just something to look at, but it’s something to feel, hear, and experience. Our project redefines how people engage with art by transforming images into immersive auditory experiences.We wanted to extend our empathy to those who experience the world differently than we do. We started by asking ourselves a fundamental question: How can we deliver amazing visual experiences in immersive, sonic representations? Art, nature, and architecture are deeply emotional experiences, yet for the 2.2 billion people globally living with vision impairment, these experiences are often described through dry, clinical language. We wanted to change that. Instead of hearing "a painting with blue and yellow swirls," imagine feeling Van Gogh's Starry Night through a haunting, swirling symphony. That's the vision behind @RT.

What it does

@RT transforms visual experiences into emotional soundscapes. Users simply point their camera at artwork, landscapes, or monuments, and @RT generates an original, AI-composed musical piece that captures the mood, colors, movement, and emotional essence of what they're viewing. Key Features:

Real-time camera analysis: No need to take photos, the app processes live footage Contextual AI descriptions: Understands not just what's in the image, but the artistic and emotional context Custom soundscape generation: Creates unique, never-before-heard musical compositions tailored to each visual Accessibility-first design: Built with screen reader compatibility and intuitive audio cues

How we built it

We developed @RT using a modern, cross-platform tech stack:

Frontend: React Native with Expo framework, enabling deployment as a Progressive Web App (PWA) accessible across iOS, Android, and web browsers Computer Vision: Integrated Overshoot for real-time image analysis and scene understanding Natural Language Processing: Mistral LLM APIs transform visual data into rich, emotionally-aware text descriptions that capture mood, composition, color palette, and artistic style Audio Generation: The Suno API converts our curated text prompts into original, atmospheric soundscapes that match the emotional tone of the visual input Architecture: Built with TypeScript for type safety and maintainability, with a modular component structure for scalability

Challenges we ran into

Latency optimization: Balancing real-time camera processing with API calls to multiple services required careful async handling and strategic caching Prompt engineering: Crafting LLM prompts that consistently produce musically-meaningful descriptions (rather than technical image labels) took significant iteration Audio-visual synchronization: Ensuring the generated soundscape emotionally matches the visual content required developing a "mood mapping" layer between our description and audio APIs Accessibility testing: As sighted developers building for visually impaired users, we worked to understand the UX nuances we might overlook

Accomplishments that we're proud of

Successfully created an end-to-end pipeline from camera input to generated soundscape in under 30 seconds Developed a novel approach to translating visual art into audio that preserves emotional resonance, not just literal description Built a fully accessible application from the ground up, not as an afterthought Helping others experience what we see daily in a completely new and novel way, bridging the gap between visual and auditory perception

What we learned

The power of multimodal AI: Chaining vision, language, and audio models creates experiences none could achieve alone Accessibility is innovation: Designing for edge cases often leads to better products for everyone Empathy-driven development: Starting from "how would this feel?" rather than "what should this do?" fundamentally changed our approach Rapid prototyping with Expo: The framework's hot reload and cross-platform capabilities dramatically accelerated our development cycle

What's next for @RT

Apple Vision Pro integration: Spatial audio that responds to head position as users explore artwork in AR galleries Museum partnerships: Working with institutions to provide @RT experiences for their permanent collections Offline mode: Pre-generated soundscapes for famous artworks, downloadable for use without connectivity Community features: Allow users to share their favorite visual-to-audio experiences Haptic feedback layer: Adding vibration patterns to complement audio for deaf-blind users

Built With

Share this project:

Updates