Gloss-guided visual-gloss alignment network for continuous sign language recognition
DOI:
CSTR:
Author:
Affiliation:

Ministry of Education and the Tianjin University of Technology,School of Computer Science and Engineering

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Continuous Sign Language Recognition (CSLR) helps deaf people to actively communicate with hearing people by recognizing their sign language as gloss. Enhancing the generalization ability of CSLR visual feature extractors is a worthwhile research area. In this work, we model gloss as prior knowledge to facilitate the learning of more generalizable visual features. Then, we present a gloss-guided visual-gloss alignment network (GVAN). Specifically, we extract gloss representations using a pretrained graph-based model. We design a cross-modality graph alignment(CMGA) mechanism that innovatively maps video and gloss text features into a heterogeneous graph composed of visual and semantic nodes, enabling effective cross-modality feature alignment. Additionally, we introduce a cross-modality alignment constraint to optimize video-text matching and ensure global semantic consistency. Experimental results on both German and Chinese sign language benchmark datasets demonstrate that the proposed GVAN achieves competitive performance. Ablation studies further validate the effectiveness of several key components within GVAN.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:May 07,2025
  • Revised:August 18,2025
  • Adopted:September 09,2025
  • Online:
  • Published:
Article QR Code