Visual Super Sampling

Shukai Ni posted an update — Apr 13, 2024 01:53 PM EDT

Introduction Rendering high definition videos or complex 3D scenarios is computationally expensive and time consuming. It would be cheaper and smoother if the quality of the rendering could be improved with deep learning frameworks. Temporal-wise: it might be possible to render at a lower FPS and insert intermediary frames in between. For example, render shaders at 10 FPS and create 5 extra frames between each frame to achieve 60 FPS. Spatial-wise, it is possible to keep the same FPS rate but render shaders and material at lower quality and improve rendered definition. However, the purpose of this project is not at the rendering pipeline. Instead, it assumes a pre-rendered stream and aims to improve definition of the output.

Frameworks: VAE, Autoencoder, GAN, etc. One existing industrial solution(DLSS by Nvidia) works at the rendering level, aiming to render 3D objects at low definition and improve output product with deep learning super sampling. In this project the goal is to 1) maintain and if possible, improve the accuracy of the output stream. This is defined as the rasterized output. In the theoretical scenario f^{-1}(f(x)) = x, meaning the super-sampled image undersampled should look like the original image. 2) The data could be acquired as the higher definition video and manual undersampled: for example, a 1080P video could be manually converted to 720p as training inputs and use the original 1080P video as the baseline. Related work Our project draws inspiration from Nvidia's Deep Learning Super Sampling (DLSS) technology, which operates at the rendering level to enhance the definition of 3D objects. While DLSS focuses on real-time rendering enhancement, our project shifts focus towards post-rendered video streams, using deep learning frameworks such as Variational Autoencoders (VAE), Autoencoders, and Generative Adversarial Networks (GAN) to upscale and enhance video quality. This shift represents a novel application of deep learning in video processing, aiming to fill a gap in current video upscaling solutions.

Our project extends the insights from “Frame Rate Upscaling with Deep Neural Networks”, which explores frame interpolation through deep learning techniques, notable CNN and GAN, This foundational work critiques linear interpolation for ites blurriness in 2D animation and tests various models for enhancing video famerates.

https://paulbridger.com/posts/video-analytics-pipeline-tuning/ https://tedxiao.me/pdf/CS294_Report.pdf

Data The dataset comprises YouTube videos, specifically selected to include both animates and real-life footage to challenge and validate our models across diverse scenarios. We will ultilize 240 framerates from 3 distinct videos, applying preprocessing steps like normalization to generate a suitable training set.

Methodology Before finally deciding on the target architecture, we decide to systematically explore several deep learning architectures to identify the optimal model for supersampling video content. This exploration will focus on Variational Autoencoders (VAEs), Autoencoders, and Generative Adversarial Networks (GANs), given their demonstrated capabilities in various image and video enhancement contexts. The core challenge we aim to address is preserving the temporal consistency across video frames while improving their spatial resolution.

We will try to minimize the discrepancy between the upscaled video output and its corresponding original high-definition counterpart. This ensures that our model can generalize across different content types. With iterative experimentation, we anticipate identifying a robust model under different input conditions.

Metrics Success will be measured by the quality of the upscaled videos compared to their original high-definition counterparts. Metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) will be used to quantitatively assess video quality. Our base goal is to achieve perceptually noticeable improvements in video quality on standard datasets. The target goal is to match or exceed the quality improvements offered by existing upscaling solutions like DLSS, without requiring integration into the rendering pipeline. Our stretch goal is to develop a model that can be applied in real-time to various video streams, including live broadcasts. Our base goal is to outperform linear interpolation methods (baseline model), our target goal is to match the performance of the state-of-the-art in frame interpolation, and our stretch goal is to surpass current methods in terms of both quality and efficiency for a wide range of video types. Ethics Broader Societal Issues

The enhancement of video framerates touches on several societal issues, including accessibility and misinformation. Higher framerates can significantly improve the viewing experience for all audiences, including those with visual impairments or seizures who may benefit from smoother transitions in videos content. However, the technology’s ability to generate realistic video frames can also be misused for creating deepfakes, potentially exacerbating problems related to misinformation and privacy violations.

Why is Deep Learning a good approach to this problem?

This is because Deep Learning is good at deciphering complex patterns within extensive datasets. This capacity is particularly beneficial for enhancing the quality of videos after they have been rendered. Utilizing Deep Learning would effectively obviate the need for more sophisticated rendering hardware or advanced rendering techniques, which are usually costly.

Division of labor Shukai Ni: Data Preprocessing, Model architecture, Parameter tuning Xianyang Xie: Data generating, Data Preprocessing, Parameter tuning Haibo Li: Data Collecting, Data generating, Model architecture

Log in or sign up for Devpost to join the conversation.