ViT-LPATA: a vision transformer model for autism detection in children using facial images
CSTR:
Author:
Affiliation:

1. School of Biological Science and Medical Engineering, Hunan University of Technology, Zhuzhou 412007, China;2. School of Computer and Artificial Intelligence, Hunan University of Technology, Zhuzhou 412007, China;3. Central China Technology Development of Electric Power Co., Ltd., Wuhan 430070, China

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To address the difficulty in recognizing subtle differences in facial biomarkers in children with autism, a learnable positional encoding enhancement (LPEE) module was combined with the adaptive token aggregation (ATA) module. The vision transformer with learnable positional encoding and adaptive token aggregation (ViT-LPATA), a predictive model for autism, was proposed. The model leverages the LPEE module to dynamically capture facial geometric deformation features and integrates the ATA module to enhance the feature representation capability of pathological regions, thereby establishing precise mappings of biomarker differences. Experiments on a publicly available autism facial dataset demonstrated that the ViT-LPATA achieved optimal performance, with 99.2% accuracy and an area under the curve (AUC) value of 0.940.

    Reference
    Related
    Cited by
Get Citation

Li DENG, Wenqiu ZHU, Yingbo WU. ViT-LPATA: a vision transformer model for autism detection in children using facial images[J]. Optoelectronics Letters,2026,22(6):379-384

Copy
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 14,2024
  • Revised:November 12,2025
  • Adopted:
  • Online: June 05,2026
  • Published:
Article QR Code