Inspiration

We were working on a whiteboard, coming up with ideas. Our main goal was to come up with something that was more useful than it was fun, but it was still something with a 'wow' factor.

After going through 4-5 ideas, we felt a real connection and found an idea we were excited to work on. This was it.

What it does

'!You' is an AI phone assistant that answers calls in your voice when you can't pick up. It sounds exactly like you, understands what callers need through real-time speech recognition, and responds naturally based on your availability status (in a meeting, on vacation, available, or just having '!You' toggled off).

You can set custom messages, and it logs all calls with AI-generated summaries so you never miss what matters.

How we built it

We started with Next.js for the dashboard and initially deployed to Vercel. Realizing we needed real-time audio, we integrated Twilio Media Streams with WebSockets and migrated to Railway. But the audio was terrible - 8kHz encoding made it sound robotic and creepy.

The breakthrough came when we realized we didn't need WebSockets at all. We pivoted to Twilio's and verbs with HTTP endpoints, serving Fish Audio's 128kbps MP3 files directly.

Suddenly the audio was crystal clear and the architecture was simpler.

We integrated OpenAI Whisper for transcription, GPT-4o-mini for contextual responses, and Fish Audio's voice cloning API, clean and simple.

The final challenge was conversation design. Through extensive prompt iteration and real-world testing (literally calling ourselves), we achieved natural, brief, human-sounding dialogue that doesn't feel like talking to a bot.

Challenges we ran into

The biggest challenge was audio quality. We initially used Twilio Media Streams with WebSockets, but the 8kHz encoding made voices sound grainy and distorted (and creepy for some reason).

After extensive debugging and research, we pivoted to using Twilio's and verbs with MP3 files at 128kbps, which dramatically improved sound quality.

We also struggled with making the AI sound genuinely human rather than robotic, requiring multiple iterations on conversation prompts to achieve natural, casual speech patterns.

Accomplishments that we're proud of

We're incredibly proud of our 'actual over the phone' voice quality that actually sounds like the user, not a generic AI voice.

The conversation flow feels natural and human. We tried to eliminate as much of 'uncanny valley' as we could in the given time frame.

We successfully built an almost production-ready system that handles real phone calls end-to-end, from call forwarding setup to responses to conversation logging.

The mode-switching system (normal/meeting/vacation/off) works flawlessly, giving users a GUI based granular control over how their AI twin behaves.

What we learned

We learned that audio quality is paramount for voice AI. Users will tolerate a 'less-intelligent' AI if it sounds clear, but won't use a 'smart AI' that sounds robotic or grainy.

We discovered that simpler architectures often work better (ditching WebSockets for REST APIs made deployment easier and audio better)

We also learned the importance of conversation design - making AI sound human isn't about perfect grammar, it's about fillers, contractions, and brevity. (Finally, we learned that Twilio's documentation can be a bit misleading about quality limitations)

What's next for !You

Next, we want to add intelligent call screening that asks "What's this about?" before deciding whether to interrupt you or handle it autonomously.

We'll implement SMS fallback so the AI can text you urgent messages if you're truly unreachable. We plan to add multi-language support for international calls, calendar integration to auto-set meeting mode based on your schedule, and voicemail transcription with sentiment analysis.

Long-term, we could envision !You becoming a full personal communication manager that handles not just calls, but texts, emails, and meeting scheduling - all in your voice and style with a user-friendly (GUI-based), and high-quality frontend.

Built With

Share this project:

Updates