Spatiotemporal attention-based audiovisual saliency prediction model for panoramic video

Spatiotemporal attention-based audiovisual saliency prediction model for panoramic video
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:Changchun University of Science and Technology
Clc Number:
Fund Project:The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Panoramic video delivers a 360° field of view, allowing users to freely explore and perceive their visual environment. Crucially, its spatial audio provides directional sound cues that guide visual attention and significantly enhance immersive exploration. However, there is little research on salience prediction through the joint use of visual and auditory modalities. To this end, we propose a spatiotemporal attention-based audiovisual saliency prediction (STAV) model that effectively leverages cross-modal spatial-temporal features from both visual and auditory streams. Specifically, we use Video Swin Transformer to extract spatiotemporal visual features from videos and design a multi-dimensional feature enhancement module (MFEM) to balance multi-scale spatiotemporal feature representations. Furthermore, we employ SoundNet to ex-tract audio features with multiple attributes and calculate the audio energy map (AEM) to perceive the location of sound sources and obtain spatial information about the audio. Finally, we fuse audio and visual features and combine them with spatially encoded cues from the AEM to generate the final audiovisual saliency map. Comprehensive experimental results on three different panoramic video audiovisual datasets demonstrate the effectiveness of this model.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 22,2025
Revised:December 18,2025
Adopted:January 14,2026
Online:
Published:

Home

About us

Authors

Editors

News

Contents

Contact us

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code