Frictionless real-time communication in a 5G world

Inspiration

The proposed application aims to improve the quality of audio during virtual conferencing as well as introducing a new level of comprehensibility for the 5G mobile users. In order to solve usual problems experienced by the participants located in multiple regions - poor audio quality, overlapping dialogues and choppy conversations - the idea is to deploy regional application shards and move the media server capacities on the edge of the Local Telecom Carrier Network to reduce drastically the latency between local participants and take advantage of higher bandwidth over 5G mobile networks. As the architecture is going to be geo-distributed, leveraging local improvements across regions would expand the same quality of service experienced by local participants at a global level. Therefore, allowing a better communication and less frustration between peers located in different regions.

What it does

Based on a WebRTC solution built upon a SFU media server architecture, the platform provide a fully geo-distributed conference service. By taking advantage of the 5G edge server in Wavelength zones, the media server have been detached from the main platform to the edge resulting in a dramatic improvement of the audio quality based in a very low latency between the media server and the 5G devices. The geo distribution of the media servers (video bridges) within an enhanced architecture allows any video conference to take place between peers based in different regions but relying on a "local" or closest optimal connection to ensure the lowest audio and video latency and the best quality. The media is interconnected via video bridges through the AWS network to avoid usual quality and latency degradation between peers geographically distributed. The second important functionality is the delivery of a real-time speech to text option using NLP via models brought closer to the originating device, in the wavelength or availability zones. The participants can get a live subtitling of their conversation, thanks to the extremely fast processing of the speech by the AI server deployed on the edge. To summarize, this application take advantage of the deployment of sensitive services on the edge to improve the quality of the video conferencing for the participants and take advantage of the architecture to boost the quality of conferencing between peers different parts of the world or countries. In order to tackle the problem of comprehensibility inherent to any video conference, the real-time subtitling - processed on the edge - benefits the interaction between participants by providing a clear live transcription to each participants.

How we built it

Based on core open source project Jitsi, the full application has been deployed on one AWS availability zones for the conference business logic and scalability reached by deploying several media servers (video bridges) on different regions, availability zones and Verizon wavelength zones. It has been deployed on the West to provide the low latency service in the Denver and Seattle areas (VZ 5g devices) and US-West-2 for other devices. The installation of NLP models and local AI services (VOSK) in the Wavelength zones provide great results for realtime transcription by sending an audio stream to those WLZ AI entities to be processed immediately. Finally, the usage of Verizon 5G Edge Discovery Service is crucial to route the participants to the optimal wavelength endpoints.

Challenges we ran into

The first one was related to the blocking of UDP traffic from the Verizon 5G network to the video bridges deployed in the WL zones. The solution was to deploy a TURN server in zone to accept and relay pure TCP traffic. The second one was related to the selection of the right service endpoint for the 5G devices (Routing). It has been resolved by using the Verizon 5G EDS services to locate the provide the information about best connectivity for the client device. The last one, is related of the display of the real time speech transcription which is currently returned based on the streaming audio and can be confusing as the transcription is done "on the fly".

Accomplishments that we're proud of

The usage of Verizon 5G edge developer portal to optimize the connectivity of the 5G devices based on Edge Discovery Service API. The integration of Speech to text in the conference by bringing the NLP model close to the video bridges in the wavelength or availability zones.

What we learned

As much as it is possible to bring services, resources and performance to the edge, the monitoring, management and scalability of this solution are essentials as well as the analysis of the metrics gathered locally.

What's next for Frictionless real-time communication in a 5G world

Depending of the result and acceptation of this concept by the judges as well as their expected feedback, the idea is to move to a second phase and taking advantage of the full exploitation of the WebRTC low latency data channels over the Ultra reliable Low Latency communication (URLLC) of the 5G networks, could be the building blocks of real-time collaboration use cases from file sharing to video interaction directly between peers. Then the combination of audio, video and data over URLLC will pave the way for integration of Augmented Reality experience in this conferencing platform. Leading to a new type of communication between peers: xR video calling. A special attention needs to be paid on the improvement of the AI on the Edge performance as well as the high costs needed to provide a usable real time transcription service.

Built With

5g
amazon-web-services
eds
jitsi
natural-language-processing
verizon
vosk
wavelength
xmpp

Updates

Christian Thomas started this project — Jun 21, 2021 02:47 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.