Current videoconferencing tools are nonsense: they make you send gigabytes of HD video that contain just a few relevant information. What if, instead of directly sending the video from your webcam, you were only sending your face expression and your lips movements? Then, we could render this on 3D avatars that we would place in a virtual meeting room.
What it does
- Create and join a room
- Speak with your peers (audio channel)
- Share your screen with your peers on the TV
- View people through a 3D avatar
- Real-time face detection and mouth opening coefficient computation (using webcam stream)
Not implemented yet:
- Share face detection and mouth opening coefficient to peers (using data channel)
- Render detected coefficients on 3D avatars
How we built it
Everything is done client-side using WebRTC (data pipes are created between clients), except for a signaling server (which is actually serverless) that connects together people in the same room. The 3D environment is based on A-Frame and the furniture on 3dio. The face expression and motion are computed in-browser leveraging Jeeliz FaceFilter SDK. Finally, the web app is a SPA that uses React.
Challenges we ran into
Rendering expressions, motions and lip-sync on 3D models require good 3D skills which we don't have. That's why the coefficients are only logged in the console and not yet rendered on 3D avatars.
Accomplishments that we're proud of
Gathering all these technologies together and proving that a videoconferencing tool of that kind would be possible.
What we learned
WebRTC is powerful but complex. It was great to discover it and to understand it.
What's next for Avatarz
Code is open-source and available here: https://github.com/ChristopheBougere/avatarz The proof of concept is already available at https://avatarz.chat While we have a bunch of good ideas to make Avatarz a really cool tool, we don't have the required skills in 3D to render expressions on avatars. We would love to get contributions from anyone willing to help!