SoundScape AI

Inspiration

We wanted to build something designed for the visually impaired, but usable for everyone. When thinking about pictures, emotion is often lost due to the absence of sound associated with them. We wanted to reintroduce that feeling of being in a scene by adding the missing audio that makes the experience of looking at memories more personal.

What it does

SoundScape works by sending a user-uploaded photo to AWS Rekognition, which generates a AI description of items present in the photo. Then, we redirect the description to ElevenLabs, which generates an audio file based on the description, which is stored in an S3 Bucket. This is then sent back to the frontend, where the user can listen to the audio alongside their uploaded photo, and download it for enjoyment later.

How we built it

Our frontend is built in TypeScript + React, and our backend uses AWS Lambda functions, written in Python. Each of us focused on one aspect and debugged issues together.

Challenges we ran into

Frontend

Our frontend was hard to debug due to the multitude of components that we created, like how uploaded files are managed, and the fact that TypeScript was new to at least one of us. Additionally, implementing accessibility features like light and dark mode was hard to sync with the SVGs, which required even more component changing.

Backend

Creating the Lambda functions was very difficult, and chaining the API calls together across different LLMs was hard to bring back to the frontend.

Using Lambda over a regular backend api was much better as it allowed us to separate calls into functions and allows us to handle the workflow much better.

Using elevenlabs api was difficult as it is a lesser known api.