Companion Ai

Inspiration

We started with a simple thought: “What if AI could be more than smart — what if it could be a companion?” That question stayed with us when we thought about people who experience the world differently, especially those who are visually impaired. For them, something as ordinary as crossing the road or walking into a room can feel uncertain. We wanted to build something that doesn’t just describe the world, but does it with warmth — a hand to hold, a voice to guide.

What it does

That’s how Companion AI was born. Using computer vision and text-to-speech, it looks at the world through a camera and narrates what it sees in real time: “A man is crossing the road holding a bag.” It turns vision into voice, so no one has to miss the story happening around them.

How we built it

We began by connecting three worlds: vision, language, and speech. First, we used a Vision-Language Model to “see” and generate captions for video frames. Then, we broke videos into images so the AI could describe them in real time. Finally, we gave those captions a voice through text-to-speech. Piece by piece, we stitched these elements together until the AI could look at a scene and narrate it instantly — almost like giving sight a voice.

Challenges we ran into

The hardest part was speed. Narration only feels useful if it happens in real time, so we had to balance accuracy with quickness. Another challenge was simplicity — an AI can describe a lot, but too much information overwhelms the listener. We had to teach the system to speak clearly, not endlessly. Integrating different tools smoothly also tested our patience, but step by step, the pieces began to flow together.

Accomplishments that we're proud of

Even though Companion AI is still in its early stage, we’re proud of the progress we’ve made in bringing the idea to life. We managed to stitch together the core pipeline — from capturing frames, to generating captions, to hearing the first text-to-speech output. It might not be perfect yet, but hearing the AI describe even a simple scene felt like proof that this could really work. Most of all, we’re proud of the vision itself: building something that puts empathy at the heart of technology.

What we learned

We learned how to bridge computer vision, natural language, and speech into one seamless experience. But more importantly, we learned that accessibility is about more than technology — it’s about empathy. Designing for people who truly need this reminded us why we build: not for the code itself, but for the lives it can touch.

What's next for Companion Ai

This is only the beginning. We imagine Companion AI becoming multilingual, so it can narrate in the user’s own language. We want it to be context-aware — not just “a person,” but “your friend is waving at you.” And one day, we see it built into glasses or earphones, offering hands-free support everywhere. Our dream is simple: to make the world feel a little more inclusive, one voice at a time.

Built With

blip
github
gtts
huggingface
llavva
opencv
pip
python
pytorch
pyttsx3

Updates

KOMATI JHNAPTI SREE 24BCA8150 started this project — Aug 30, 2025 11:07 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.