3Draw [HackIllinois '22]

Goose, ready for VR.
its a display board to tell players how much time is left in the current player's turn
the artist can see the full word for the duration of their turn
Rendered Goose.

3draw

*Finally, something to do with your VR headset! (friends sold separately.). * (repo)

Inspiration

Half of our team recently won VR headsets at a previous hackathon. Eventually, the novelty of Beat Saber, Gorilla Tag and other somewhat dubious activities wore off so we decided to make something ourselves, drawing inspiration from various party games.

3Draw is the latest in a long series of thing-drawing games like skribbl.io or Gartic Phone. We stand out via our chaotic, AI-powered, voice-chat guessing system and largely dysfunctional artistic tools. Try it yourself at 3draw , I'm not going to tell you how it works! (if you have a headset and friends to play with that is, our demo video will suffice otherwise.)

Implementation

Usually we'd flex the complicated docker-compose architecture here. We ended up having a pleasantly vanilla project:

This is a joke

Thanks to our resident Nix enthusiast, yarn.nix dominated the line count but the human written code was (nearly) all HTML, CSS and JS.

We used Cloudflare Tunnels for both testing and deploy which we had set up in the first 14 seconds of hacking, it was extremely convenient. We used WebXR, interfacing with it through the batteries-included (though dated) A-Frame framework. We used RTC to communicate between A-Frame instances via Networked A-Frame (NAF) and an EasyRTC instance. 3D modeling work was done in Blender, we borrowed a castle from here (attribution!). To manage state, we created a chimera of various techniques that could be (generously) described as an amalgamation of MVC (fattest controllers you've ever seen) and React.

Voice chat was conveyed over RTC (as were most things) but speech-to-text was a nightmare. The original plan was to use the Web Speech API, we thought the Webkit prefixed version would be available since Oculus headsets are glorified Android devices and their browser is based on Chromium under the hood. Apparently somewhere along the chain the assumptions broke down. Part of our team spent about 8 hours trying to implement a client-side TFLite on TensorFlow.js model for limited-vocabulary recognition, effectively reverse engineering the Web Speech API. We needed a model that would be easy to train and extremely low latency so leveraging word embeddings was an obvious plan, until it wasn't. Object guesses did not appear in ordinary contexts and it seems that SOTA techniques for limited-vocabulary detection are now much more sophisticated with pipelines of several models. As such, we turned to the omniscient GCP Cloud Speech API and had incredible difficulty streaming audio to it in a low latency manner. In fact, the latency for the GCP Cloud Speech API was in the order of minutes. Whether that was due to misconfiguration on our end or issues on Google's end, the world may never know. Additionally, the speech recognition pipeline just wasn't very good.

Very good transcription, wow

What's Next

Probably actually playing it in a setting other than (hackathon-induced-)stress testing. If it's fun, perhaps we'll flesh it out a bit more. rm -rf will suffice otherwise.

Citations

[1]

I. McGraw et al., “Personalized Speech recognition on mobile devices,” arXiv:1603.03185 [cs], Mar. 2016, Accessed: Feb. 26, 2022. [Online]. Available: http://arxiv.org/abs/1603.03185

[2]

P. Warden, “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,” arXiv:1804.03209 [cs], Apr. 2018, Accessed: Feb. 26, 2022. [Online]. Available: http://arxiv.org/abs/1804.03209

[3]

J. Shor et al., “Towards Learning a Universal Non-Semantic Representation of Speech,” Interspeech 2020, pp. 140–144, Oct. 2020, doi: 10.21437/Interspeech.2020-1242.

(repo)

Built With

aframe
cloudflare-tunnel
ecs
gcp
google-cloud-speech
html5
javascript
oculus
webxr

Submitted to

HackIllinois 2022

Created by

realtime goose (state+position sync/mulitplayer work)

Samyok Nepal
thinker of goose
I worked on game state management, including the process by which a user is determined to be the host of the room, as well as the mechanism through which turns/rounds are executed

Ritik Mishra
Honk! I worked on (failing to) reverse engineer the Chrome Closed Captioning model (SODA) and worked on the Google Cloud Speech API proxy service.

Private user
Ben Weiner
Minnerva Zou
goose enthusiast
Sasha Hydrie