Visual Super Sampling

Shukai Ni posted an update — Apr 28, 2024 09:54 AM EDT

Introduction

This can be copied from the proposal.

Rendering high definition videos or complex 3D scenarios is computationally expensive and time consuming. It would be cheaper and smoother if the quality of the rendering could be improved with deep learning frameworks. Temporal-wise: it might be possible to render at a lower FPS and insert intermediary frames in between. For example, render shaders at 10 FPS and create 5 extra frames between each frame to achieve 60 FPS. Spatial-wise, it is possible to keep the same FPS rate but render shaders and material at lower quality and improve rendered definition. However, the purpose of this project is not at the rendering pipeline. Instead, it assumes a pre-rendered stream and aims to improve definition of the output.

Frameworks: VAE, Autoencoder, GAN, etc. One existing industrial solution(DLSS by Nvidia) works at the rendering level, aiming to render 3D objects at low definition and improve output product with deep learning super sampling. In this project the goal is to 1) maintain and if possible, improve the accuracy of the output stream. This is defined as the rasterized output. In the theoretical scenario f^{-1}(f(x)) = x, meaning the super-sampled image undersampled should look like the original image. 2) The data could be acquired as the higher definition video and manual undersampled: for example, a 1080P video could be manually converted to 720p as training inputs and use the original 1080P video as the baseline.

Challenges: What has been the hardest part of the project you’ve encountered so far? It challenges us at first in terms of finding a general solution to unify the image data format. We decided to use patches with a fixed size (50 for low-res images and 100 for high-res images), so that images with different sizes can now be preprocessed into patches with identical shapes. Deeper models may capture more complex patterns but are also harder to train and require more data. There’s a need to balance the depth and complexity of the model to avoid overfitting. VAEs are known for producing somewhat blurry results compared to deterministic models, which might be less desirable for applications where clarity if crucial for our goal.

Insights Are there any concrete results you can show at this point? We have finished implementing preprocess and experimental models. At this point, we also implemented a universal visualization techniques that can be applied to all of the models we desire to experiment with. How is your model performing compared with expectations? During this exploration stage, our focus is only testing out well developed architecture, and they have mediocre performance, we can generally see they work for this expected scenario, but not very effective in improving the definition.

Plan

Are you on track with your project? What do you need to dedicate more time to? What are you thinking of changing, if anything?

The project's progression has been consistent, albeit partially aligned with our initial objectives. There is a clear necessity to allocate further resources and time towards enhancing the efficiency of our models.
Necessitating the exploration and implementation of innovative data augmentation techniques. By incorporating a wider variety of video types and conditions into our training regimen, we aim to bolster the models' robustness and ensure uniform performance across diverse visual scenarios.

Log in or sign up for Devpost to join the conversation.