Gloss-guided visual-gloss alignment network for continuous sign language recognition

Gloss-guided visual-gloss alignment network for continuous sign language recognition
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:Ministry of Education and the Tianjin University of Technology,School of Computer Science and Engineering
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Continuous Sign Language Recognition (CSLR) helps deaf people to actively communicate with hearing people by recognizing their sign language as gloss. Enhancing the generalization ability of CSLR visual feature extractors is a worthwhile research area. In this work, we model gloss as prior knowledge to facilitate the learning of more generalizable visual features. Then, we present a gloss-guided visual-gloss alignment network (GVAN). Specifically, we extract gloss representations using a pretrained graph-based model. We design a cross-modality graph alignment(CMGA) mechanism that innovatively maps video and gloss text features into a heterogeneous graph composed of visual and semantic nodes, enabling effective cross-modality feature alignment. Additionally, we introduce a cross-modality alignment constraint to optimize video-text matching and ensure global semantic consistency. Experimental results on both German and Chinese sign language benchmark datasets demonstrate that the proposed GVAN achieves competitive performance. Ablation studies further validate the effectiveness of several key components within GVAN.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 07,2025
Revised:August 18,2025
Adopted:September 09,2025
Online:
Published:

Home

About us

Authors

Editors

News

Contents

Contact us

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code