Dive Deeper with Real-Time Underwater Enhancement

output of model_1 reconstructing without natural colors
output of model_1 reconstructing with natural colors

Inspiration

Exploring underwater environments is extremely challenging due to low visibility, murky water, and lack of light. Humans can only dive to limited depths, and autonomous systems like drones or ROVs struggle with unclear video feeds. We were inspired to create a real-time enhancement system that allows both humans and machines to “see deeper,” improving object detection, navigation, and research in underwater environments.

What it does

Dive Deeper with Real-Time Underwater Enhancement is a deep learning-based system that processes video frames in real-time to restore visibility, correct color distortions, and enhance details. It supports two modes:
- Model 1 / Color Enhancement: Generates frames with extracted colors while maintaining medium FPS (~30).
- Model 2 / Black Vision High FPS: Generates grayscale-enhanced frames with high FPS (~60) for fast object detection scenarios where color is less critical.
The system outputs a side-by-side view of the original feed and enhanced frames, making it easier to compare and analyze underwater scenes.

How I built it

This system is designed for real-time vision enhancement, where frame rate (FPS) is critical for downstream tasks such as object detection, tracking, and navigation. Integrating a full-fledged dedicated UI significantly impacts performance and is therefore avoided. A dedicated UI introduces multiple sources of FPS degradation:

Model output frames must be transferred to the UI pipeline, causing additional memory copy overhead and FPS drop.
UI rendering consumes CPU/GPU resources, leading to slower inference and increased latency.
GPU context switching between enhancement and UI rendering results in unstable frame rates.
Since the output is fed directly into object detection, UI rendering provides no functional benefit but degrades performance.

Languages & Frameworks : Python, PyTorch, OpenCV, NumPy, PIL, Tkinter
Neural Networks: Multi-Patch Hierarchical CNN (Encoder-Decoder architecture)
- Each video frame is divided into patches at multiple scales:
- Level 1: 4 small patches – captures fine-grained local details like edges and textures.
- Level 2: 2 merged patches + full frame – captures medium-scale structures like object boundaries
- Level 3: Full frame – captures global lighting, haze, and color distortions.
- These multi-scale features are fused to produce a single enhanced frame.
Techniques:
- Multi-scale patch-based enhancement (local and global features)
- LAB color preservation to maintain natural colors
- CLAHE for local contrast enhancement
- Optional dehazing for murky and foggy water
Application: Desktop GUI built with Tkinter showing side-by-side comparison in real-time

Challenges I ran into

Balancing real-time FPS with enhancement quality. Some models produced beautiful colors but were slower (~30 FPS), while faster models (~60 FPS) worked in black-and-white but were suitable for detection tasks.
Preserving original scene information while enhancing contrast and brightness without over-saturating colors.
Ensuring GUI performance and smooth video playback alongside model inference on GPU.

Accomplishments that I am proud of

Achieved real-time enhancement with FPS up to 60 for black vision mode and 30 for color-enhanced mode.
Built a user-friendly GUI that shows original vs enhanced frames in fixed-size black boxes like a web dashboard.
Successfully enhanced murky and dark underwater scenes, making objects and structures clearly visible for both human interpretation and downstream detection tasks.

What I learnt

Multi-patch hierarchical networks can effectively enhance fine-grained details and global context simultaneously.
There’s a trade-off between FPS and color extraction quality, and different models can be chosen based on the required task (e.g., detection vs presentation).
LAB color space and CLAHE are highly effective for preserving natural colors and improving local contrast in real-world images.

What's next for Dive Deeper with Real-Time Underwater Enhancement

Integrate object detection on enhanced frames for autonomous underwater systems.
Optimize models further for higher FPS with color preservation, enabling smooth real-time monitoring for research, drones, and ROVs.
Implement navigation without GPS using visual odometry and SLAM from enhanced frames.
Create an exploration database logging frames, position, depth, and FPS for analysis.

Key Informations

This system assists object detection; integrating it into existing pipelines can improve detection performance in challenging environments.
It is a lightweight, plug-and-play model that can be easily used without heavy dependencies.
No dedicated UI is required for real-time operation, as adding one may affect model performance and FPS.
A Tkinter UI has been created for demonstration and testing purposes only. In real-world deployments, the model can directly feed enhanced frames to autonomous navigation or object detection systems.

Built With

Submitted to

Alameda Hacks

Created by

I independently designed and implemented the complete real-time underwater vision enhancement system. This includes developing the deep learning–based enhancement pipeline, optimizing it for high-FPS inference, and integrating luminance-preserving techniques to improve visibility without relying on color extraction. I handled model selection, preprocessing and post-processing design, performance optimization, and real-time video integration. I also built a lightweight demo interface purely for visualization while ensuring the core system remains plug-and-play for direct integration with object detection pipelines.

Matheshwaran Prakash

Updates

Matheshwaran Prakash started this project — Jan 09, 2026 04:05 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.