Smart speakers offer a wealth of convenience, combining intelligent knowledge engines with high quality sound. However, often in the case of these devices, the audio equipment is given far less attention than the speakers' smarts, preventing them from being used as richly as they should. We present Cul de Sac, a system for joining multiple homes in unison in order to play rich, 3D audio.
Google Home has supported a concept of "device groups" for a few months, which allows multiple devices to play back the same audio simultaneously. However, there is no way to play different audio streams on related devices, as would be necessary for a stereo or surround sound system. Cul de Sac is a hack that allows for synchronized streaming on multiple devices and allows for playback of synthesized surround sound.
The core of our design is a node server, which streams the audio data to the various Chromecast devices. Through careful management of the buffering that the Google Home performs as it streams audio, we are able to automatically synchronize all of the available devices on the network while also streaming unique audio data to each. (Note that the similar Google Home feature that works out-of-the-box does not have an exposed API and is only capable of streaming a single audio source to all devices, which is not sufficient for what we are seeking to do here.) We have also created a native Android app for both fine-tune management and auto-synchronization.
For the purposes of the demo, we have also created a simple Daydream VR game, Homeward. In this game, invisible aliens attack the player from all sides. The player must shoot the aliens using only the audio cues that Cul de Sac provides. We also demonstrate richer music playback via a roving audio source that appears to move with the music.
Synchronization turned out to be a much more difficult problem than we had originally anticipated. We originally had a feedback loop from the server to the Android app using the microphone and high-frequency audio in order to synchronize the Chromecast devices, but we were unable to do better than about 0.1s accuracy. Unfortunately, this proved insufficient for audio playback as the difference was readily noticeable. The current version, which uses an understanding of internal Google Home audio buffer, is able to synchronize the Cast devices down to the millisecond. We can then manipulate an individual stream for each Home allowing us to vary pitch, frequency, or even the chosen audio track. In addition, high-quality surround sound systems often rely on latencies of less than a millisecond, and beamforming (which was our original goal) would rely on accuracy on the scale of individual samples.
If we're able to construct a more accurate synchronization routine, we'll be able to do more and better processing. It would be cool to meet our initial goal of beamforming, though it's likely that it's simply not possible due to the lack of a wired connection to the hardware. Reducing the audio latency would also be nice if possible. However, the most interesting opportunities for future work lie in the capabilities of this system. As we have demonstrated, this system can be used to bring richer audio experiences for VR, entertainment, and music.