Abstract:To address challenges in pedestrian detection within dense scenes, including high crowd density, severe occlusion, and overlapping individuals, an improved you only look once (YOLO)-based algorithm is proposed. First, deformable convolutions are employed to replace standard convolutions, enhancing the model’s adaptability to variations in shape and appearance under occlusions. Second, a multi-dimensional attention module is designed to emphasize critical local regions and extract more precise feature information. Lastly, a diagonal difference intersection-over-union (IoU) loss function is introduced, which incorporates a measure of the Euclidean distance difference between the main diagonal points of predicted and ground truth bounding boxes, thereby enhancing detection accuracy and regression performance. Experimental results demonstrate that the enhanced algorithm achieves a mean average precision at IoU=0.5 (mAP50) of 75.1% on the public dense pedestrian dataset WiderPerson, an improvement of 1.8% over the original YOLOv5 model, showcasing superior detection performance.