Abstract:For low-light image enhancement tasks, RAW images surpass RGB images due to their high information content, however, their noise and single-channel nature challenge feature extraction. Existing methods using multi-stage convolutional neural network (CNN) frameworks struggle with global feature extraction, while single-stage CNN-transformer fusions often result in residual noise. To overcome these limitations, this paper introduces a multi-stage RAW image enhancement network combining CNN and transformer. Considering the characteristics inherent to the task, we devised a CNN-based denoising block for the denoising stage and incorporated wavelet information to enhance frequency features. A transformer-based correction block has been designed for the color and white balance recovery stage, with the white balance being adjusted dynamically using a signal-to-noise ratio (SNR) map. With this design, our method outperforms other state-of-the-art models in all metrics on the Sony and Fuji datasets of see-in-the-dark (SID), and achieves optimal structural similarity index measurement (SSIM) on the mono-colored raw (MCR) dataset.