As of January of 2020, approximately 4.54 billion people are active internet users, which encompasses nearly 60 percent of the world's population. As incredible as this is, with the internet being one of the most transformative and fast-growing technologies in our history, there is still plenty of room to grow. While many people have simple internet access, those in rural areas and developing countries in the world have very slow, poor access to the internet which significantly limits their capabilities to interact with the more intensive, modern applications of the internet.

Strategy Business Magazine tells us "To make the Internet affordable to everybody, global prices would need to fall by an average of 90 percent — which is implausible with current infrastructure costs." Therefore the only way to eliminate this barrier of cost from the internet is the approach the problem from the angle of making access more efficient by modernizing the way we transmit data.

When considering the power of the internet, perhaps the most profound applications are those that simply can not be run without a high-speed connection. More specifically, transmitting live video feeds in video calls across networks.

Within the realm of psychiatry, a pressing issue is a growing demand for mental help assistance being matched with a limited supply of therapists and psychiatrists, even more so in rural and developing areas. Sonja Lynm, DO tells us in her article that 'Psychiatrists who choose to practice in rural areas are faced with many challenges, including impoverished populations, stigmatization of mental health conditions, reduced access to treatment, and provider shortages.'

Ideally, if we could ignore the internet connection woes in these areas, we could apply a Telehealth solution such that patients can receive the same kind of face to face assistance that they'd receive in person. However, keeping the previous facts in mind, we know this is difficult. To solve this, we wanted to implement a solution that could make it easier for those in affected areas to find face to face mental health assistance.

What it does

Our product begins with the clients phone. We have developed an application which takes live video as an input, hosts a PyTorch model locally, and transmits the video from one phone to another over an existing, potentially slow, internet connection. In the application side, we have developed an intuitive User Interface to ensure a positive experience for people of any technical background. For this aspect we have built pages, wireframes, and a flow diagram for the user experience as well as individual page designs and logos to be implemented into the app. On the model side, we have a Sparse Residual Autoencoder. This algorithm learns the best way to compress the video into a smaller representation and learns in parallel the best way to decompress the representation into a full image again. Normally, these types of algorithms are slow and poor at reconstructing edges. However, by allowing the model to pass sparse information about the edges, we allow the model to focus only on texture and color information, rather than complex edges. In addition, we utilize techniques such as weight and bias pruning, where we explore our network to find low impact calculations and we remove them. In modern networks, we can prune up to 90% of calculations without any statistical difference in accuracy.

How we built it

UI: We utilized Adobe Xd, Illustrator, and Photoshop to design wireframes for the application as well as plans for the user experience as they utilize our platform. After creating an initial high-level design we began by creating individual pages, laying out what tools would be available to the end user. Dev: We converted the prototypes into a functional User Interface. In addition we created a tool to benchmark the model's speed as it ran on phones (low-resource) as well as in low internet connections (low-bandwidth). Data: We utilized PyTorch to design the architecture for the Sparse Residual Autoencoder. After verifying the architecture choices in local benchmarks, we created a Google Cloud Compute Instance with 4 V100s to train efficiently. We consistently pulled model checkpoints to verify and visualize the current performance. In addition, we took the final model and sparsified through pruning. We removed 87% of the original weights and biases to simplify calculations. We began implementing quantization of weights in order to allow the user to run expensive calcuations in INT32 or FLOAT32 instead of FLOAT64. This regularly results in 3-4x faster compression and decompression.

Challenges we ran into

Since we were working with hundreds of gigabytes of data from online live streaming sources, this created a multitude of issues during development, it was out of the picture to train locally. However, when we began utilizing cloud compute, we ran into errors of storing the huge amounts of video data for training (our 20 hours of uncompressed 1080p 60fps video was approximately 24.44TB). This led us to learn how to efficiently download, decode, split into batches, feed through our model for training, and delete the data before we ran out of memory or storage. Additionally we had trouble implementing phone to phone communication and were not able to get a stable video stream on non-emulated environments.

Accomplishments that we're proud of

In UI Design: We focused on minimizing selection fatigue by matching the available doctors to users automatically. In order to decrease the visual overwhelms, we placed only two buttons of main functions—Upcoming call and Call a Doctor Now—on the home page. Traditional icons and short instructions were used to decrease the learning curve for the first time users.

In Machine Learning: Our team was able to develop a novel compression algorithm based on sparsity and deep learning via Sparse Convolutional Autoencoders. Not only does our algorithm achieve a high compression ratio (93.75%, 1/16 of original video size) it does so efficiently after many iterations of pruning and quantization (able to process at 30 fps). The algorithm uses a novel “Residual Bridge” which relies on Discrete Wavelet Transforms and thresholding to capture edge information. The full extent of the mathematics and technical details are compiled into a concise paper ( Additionally, we have efficient pruning methods which are able to reduce the compute needed by a mobile device by more than 90%.

What we learned

We had certain difficulties with the extremely large amount of data, we learning how to build an efficient pipeline to feed the data into a deep learning model without worrying about running out of memory, and storage space. In addition, we explored and implemented SOTA papers that defined algorithms that detect importance for pruning as well as balancing error when you quantize values.

What's next for TelePath

The back end for our application is quite robust and useful in applications beyond TeleHealth. Consequently, we would love to apply our video compression algorithm in essentially any area where it would be useful. This includes live-streaming, general video calling, and more. Specifically, we can visualize this to be applied on client-side video streaming platforms like Netflix and Twitch. By reducing the video transmission data we can cut back on the costs it takes to run these businesses as well as reduce the amount of environmental impact their server farms have by a large margin. Finally, we could apply this in businesses such as Catepillar. By providing an efficient method of data compression we allow for live streaming from machines that are in the field, even if the bandwidth of the device is very small. This can be used for data storage as well, and gains significant compression increases when such is done, if retrained some we could apply this same technology to store days of video with only several GB.


4.54 Billion Internet Users -

Share this project: