Inspiration
Being able to shop in 3D in VR will revolutionize many industries. I have never seen a WebXR showroom with an AI sales agent, but I believed it was possible, so I set out to build one myself.
What it does
VR will not be the future of all e-commerce, but for some use cases it will be invaluable. This demo showcases playground equipment, which is a product which you really do need to see up close at true scale in order to properly evaluate. You can enter this experience from a web browser in one click without having to download anything or log in.
How we built it
I created a website for a fictional company, then designed and modeled some simple playground structures. The avatar is from Ready Player Me (readyplayer.me) and the animations are from Mixamo (mixamo.com). Speech recognition is handled by ElevenLabs (elevenlabs.io). Lip sync viseme definitions are borrowed from Rhubarb (https://github.com/DanielSWolf/rhubarb-lip-sync). My WebXR experience utilizes the three.js framework to run entirely in html.
Challenges we ran into
For conversational AI speech to work well, there needs to be minimal delay between responses. Agents are ironically too slow to do all the talking, so the best practical implementation turns out to be a hybrid bot approach which uses AI to deliver canned phrases, then falls back to an LLM for more complicated queries. I tried to build a bit of some of this system myself, but I am not a chat bot engineer so I will leave it up to Salesforce.
Accomplishments that we're proud of
This is only a very thin vertical slice, but I feel that it successfully proves the concept. I am especially happy to have it all running cleanly in the web browser.
What we learned
Gemini is quite fast, and therefore probably the preferred AI model. Adding blinking eyes to the NPC was a surprisingly effective way to instill them with life. Positions of shoppers in 3D space can be used to predict context, because they will obviously be more interested in items which they are moving closer towards. I discovered there is a native Web Speech API, but sadly it does not work with the Meta browser so I could not even test it. Speak.js is a javascript speech system which sounds terrible and is unusable. I came very close to getting PiperTTS to work locally, but it was just a little too heavy for WebXR.
What's next for VR NPC sales agent
Further explorations into independent, open-source, fast, lightweight, realistic speech systems so this project would not have to rely on ElevenLabs as a third party service.
Built With
- elevenlabs
- html
- javascript
- salesforce
- three.js

Log in or sign up for Devpost to join the conversation.