Set Voice

Inspiration

Event organizers and fitness influencers know that the magic of a live event is the personal connection, but you cannot manually message 5,000 attendees. Usually, you are stuck choosing between generic blast texts or robotic AI that feels fake. We wanted to bridge that gap by giving organizers the power to send voice notes that feel 100% authentic. This makes every attendee feel like the host just pulled them aside in the middle of the crowd.

What it does

SetDM Voice creates personalized event logs at scale. We take an organizer’s master audio and use AI to surgically weave in personal details, like an attendee’s specific breakout session or a charity milestone they just hit, using a perfect clone of the host's voice. To make it unmistakably human, we layer in live event ambience such as the roar of a festival crowd, the chatter of a medical conference lobby, or the sound of a starting gun at a local 5K. It sounds like the organizer is literally on the floor, stopping their busy day to check in on you.

How we built it

We built a complex audio orchestration pipeline using Node.js and ffmpeg.

Voice Synthesis: We used ElevenLabs for high-fidelity voice cloning so the AI inserts match the organizer's live, high-energy tone perfectly.
Precision Splicing: We used Whisper for speech to text with word-level timestamps to find the exact millisecond to stitch the AI voice into the ambient recording.
Contextual Intelligence: An LLM analyzes attendee registration data to write personalized shout outs that match the cadence and length of the original audio.
Dynamic Audio Engineering: We used ffmpeg filter graphs to create seamless crossfades and sidechained the event background noise so it swells when the host pauses to wave at someone.

Challenges we ran into

The biggest boss level challenge was the Acoustic Seam. Transitioning from a studio-recorded human voice to an AI-generated personalized insert while keeping the background festival noise consistent is hard. We had to obsess over volume normalization and matching the room reverb of the event space. Another hurdle was the Wildcard Logic, which involved ensuring that spontaneous interruptions, like a loud concert speaker or a passing runner, felt like genuine accidents rather than scripted events.

Accomplishments that we are proud of

We are incredibly proud of the Real-Time Chaos feature. It is the ultimate psychological unlock for event engagement. Most event tech tries to be perfect, but we leaned into the messy reality of live events. Seeing the system programmatically inject a "Wait, the keynote is starting, got to go!" moment with the background crowd noise swelling in sync is a total aha moment for anyone who receives the message.

What we learned

We learned that Live trust is built in the details. In a world of polished marketing, attendees crave the authentic and messy energy of being there. We also leveled up our skills in programmatic audio manipulation, realizing that the vibe of a busy lobby is just as important as the actual words being spoken.

What is next for Set Voice

We are heading straight for live production. Our next step is integrating this into RFID check-in systems at conferences to trigger a personalized welcome voice note the second a person walks through the door. We also plan to build a "Venue Library" where organizers can toggle their environment from main stage echo to a quiet VIP lounge to keep their attendee outreach feeling fresh and real throughout a multi-day event.