PhISH-Net: Physics Inspired System for High Resolution Underwater Image Enhancement

Indian Institute of Science, Bengaluru, India
WACV, 2024

Abstract

Underwater imaging presents numerous challenges due to refraction, light absorption, and scattering, resulting in color degradation, low contrast, and blurriness. Enhancing underwater images is crucial for high-level computer vision tasks, but existing methods either neglect the physics-based image formation process or require expensive computations. In this paper, we propose an effective framework that combines a physics-based Underwater Image Formation Model (UIFM) with a deep image enhancement approach based on the retinex model. Firstly, we remove backscatter by estimating attenuation coefficients using depth information. Then, we employ a retinex model-based deep image enhancement module to enhance the images. To ensure adherence to the UIFM, we introduce a novel Wideband Attenuation prior. The proposed PhISH-Net framework achieves real-time processing of high-resolution underwater images using a lightweight neural network and a bilateral-grid-based upsampler. Extensive experiments on two underwater image datasets demonstrate the superior performance of our method compared to state-of-the-art techniques.

Problem Setting

Underwater images are degraded by two physical effects: (i) direct signal attenuation, where colors are absorbed differently by water as a function of depth and wavelength (red absorbed more than blue/green); and (ii) backscatter, from light scattered by suspended particles toward the camera. We follow the UIFM from Sea-Thru, which models any captured image \(I\) per channel \(c \in \{r,g,b\}\) as:

\[ I_c = D_c + B_c \tag{1} \]
\[ I = J \cdot e^{-\beta_d z} + B^\infty (1 - e^{-\beta_b z}) \tag{2} \]

Here \(J\) is the unattenuated scene, \(z\) is depth, \(\beta_d\) is the wideband attenuation coefficient, \(\beta_b\) is the backscatter coefficient, and \(B^\infty\) is the background backscatter color. The goal is to recover \(J\) from \(I\) and a depth map \(z\).

Sample PhISH-Net enhancement results from UIEB dataset
Fig. 1: Sample results on the UIEB dataset. Top: input underwater images; Bottom: corresponding PhISH-Net enhanced outputs.

Proposed Method: PhISH-Net

Backscatter Estimation

We first estimate and remove the backscatter component \(B\). The depth map \(z\) is obtained from an off-the-shelf monocular depth estimator (boosted MiDaS). The depth map is partitioned into 10 evenly spaced clusters; within each cluster, the darkest 1% of RGB triplets (where \(I \approx B\)) are collected into set \(\Omega\). An overestimate of backscatter is \(\hat{B}(\Omega) \approx I(\Omega)\), which follows:

\[ \hat{B} = \underbrace{J' \cdot e^{-\beta_{d'} z}}_{\text{Residual}} + B^\infty(1 - e^{-\beta_b z}) \tag{3} \]

The coefficients \(B^\infty, \beta_b, \beta_{d'}, J'\) are estimated by non-linear least squares fitting. The direct signal is then \(D_c = I_c - \hat{B}_c\).

Impact of depth boosting on depth estimation
Fig. 2: Impact of Depth Boosting. (a) Sample image from the UIEB dataset. (b) Depth estimate from the base MiDaS model. (c) Depth estimate after boosting, showing finer detail for backscatter estimation.

PhISH-Net: Retinex-Based Enhancement

The direct signal \(D\) resembles a low-light underexposed image. Following the retinex model, we decompose \(D = S \ast \tilde{I}\) where \(S\) is the illumination map and \(\tilde{I}\) is the reflectance (enhanced image). PhISH-Net predicts a 3-channel illumination map \(S_{hr}\) from which the enhanced image is obtained as \(I_{out} = D_{hr} / (S_{hr} + \epsilon)\).

The network uses a lightweight convolutional encoder to extract features at low resolution. Local and global features are then used to predict bilateral grid coefficients, which are applied to a full-resolution guide map to produce \(S_{hr}\). This keeps most computation at low resolution while generating high-resolution outputs.

Wideband Attenuation Prior

To couple the deep enhancement with the physics of the UIFM, we introduce a novel loss that constrains the predicted illumination map to respect the known behavior of the attenuation coefficient \(\beta_d\). A coarse estimate of \(\beta_d\) can be derived from the predicted \(S_{hr}\) and depth \(z\):

\[ \hat{\beta}_d(z) = \frac{-\log S_{hr}}{z} \tag{4} \]

Sea-Thru established that \(\beta_d\) follows a two-term exponential decay with depth:

\[ \beta_d(z) = a \cdot e^{-b \cdot z} + c \cdot e^{-d \cdot z} \tag{5} \]

The wideband attenuation prior loss enforces this relationship:

\[ \mathcal{L}_a = \left\|\frac{-\log S_{hr}}{z} - (a e^{-bz} + c e^{-dz})\right\|^2 \tag{6} \]

The coefficients \(V = [a, b, c, d]\) are predicted by a learnable network from encoder features. The total training loss combines reconstruction, color, smoothness, and attenuation prior:

\[ \mathcal{L} = w_r \mathcal{L}_r + w_c \mathcal{L}_c + w_s \mathcal{L}_s + w_a \mathcal{L}_a \tag{7} \]

with \(w_r = 10,\ w_s = 2,\ w_c = 1,\ w_a = 0.5\).

PhISH-Net pipeline overview
Fig. 3: PhISH-Net pipeline. (1) Backscatter is estimated and removed using depth. (2) PhISH-Net predicts bilateral grid coefficients from the low-resolution direct signal, yielding a high-resolution illumination map and enhanced image via the retinex model.

Results

Comparison with State-of-the-Art

Method PSNR↑ PSNR-L↑ SSIM↑ PCQI↑ UCIQE↑ UIQM↑ UICM↑ UIConM↑ CCF↑
UIEB Dataset
FUnIE-GAN 17.3820.060.7290.639 0.5461.3995.7781.16121.06
UW-GAN 16.2319.150.7640.719 0.5661.3475.5551.12222.84
UWCNN 12.0213.780.6470.392 0.5061.0651.9220.94011.00
HLRP 13.0313.790.2870.233 0.6361.6489.5091.19236.93
MLLE 18.1119.470.7990.911 0.6041.6294.8271.01236.27
IBLA 15.5617.780.7390.695 0.6021.4407.3311.01428.98
TOPAL 20.5922.670.8670.715 0.5841.2215.0091.03221.22
UDCP 11.9312.730.6440.612 0.5961.5737.1991.17527.14
Water-Net 18.7019.720.8620.693 0.5711.2645.0641.06316.50
ICSP 11.7913.100.6340.732 0.5641.4766.3011.04826.69
PhISH-Net 21.1423.430.8690.929 0.6411.5978.8171.15137.24
EUVP Dataset
FUnIE-GAN 20.5627.470.8870.893 0.5091.5554.0981.25029.23
UW-GAN 15.7622.840.9160.965 0.5261.4743.3911.21629.31
UWCNN 15.5218.450.8440.645 0.5431.4211.6381.26119.57
MLLE 14.2516.190.6131.030 0.5881.7302.9910.77636.22
IBLA 16.9223.090.8640.989 0.5901.5624.6181.11039.54
TOPAL 18.3024.480.9340.994 0.5831.4913.2521.19934.90
Water-Net 18.2624.350.9360.883 0.5791.4983.1471.23125.62
ICSP 12.1314.510.6710.980 0.5751.5904.0920.97341.29
PhISH-Net 20.9227.470.8561.038 0.5921.5934.3571.15138.86

Table 1: Image quality metrics on UIEB and EUVP datasets (all higher is better). PhISH-Net achieves the best PSNR, PSNR-L, and SSIM on UIEB, and is competitive across all metrics on EUVP.

Ablation Study

\(\mathcal{L}_r\) \(\mathcal{L}_c\) \(\mathcal{L}_s\) \(\mathcal{L}_a\) PSNR↑ SSIM↑
20.800.830
20.860.832
20.870.832
21.140.869

Table 2: Ablation of loss components on UIEB. The wideband attenuation prior \(\mathcal{L}_a\) provides the largest single gain, confirming the value of physics-based supervision.

Impact of Photofinishing

Impact of photofinishing post-processing
Fig. 4: Impact of Photofinishing (PF). Left: raw underwater image; Centre: PhISH-Net output; Right: PhISH-Net + PF, showing improved naturalness and colour balance.

BibTeX

@InProceedings{chandrasekar2024phishnet,
  author    = {Chandrasekar, Aditya and Sreenivas, Manogna and Biswas, Soma},
  title     = {PhISH-Net: Physics Inspired System for High Resolution Underwater Image Enhancement},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024}
}