Abstract:To address the challenge of real-time, non-contact accurate estimation of blood concentration during endoscopic minimally invasive surgery, this study proposes a blood concentration prediction method that fuses deep features with handcrafted features. First, an image acquisition platform under fixed imaging conditions was established, and a self-built dataset comprising 15 concentration levels and a total of 1287 images was constructed. For feature extraction, a pre-trained EfficientNet-B0 was em-ployed to extract deep semantic features, while a 4-dimensional handcrafted feature set was extracted based on sharpness and color distribution. Multi-source information fusion was achieved through feature concatenation. For the prediction strategy, the traditional continuous regression problem was transformed into a discrete classification weighting problem, enabling continuous concentration output via Softmax probability weighting. Experimental results demonstrate that the proposed method achieves a Mean Absolute Error of 0.0195 ml/L and a tolerance accuracy of 90.67% on the validation set, significantly outperforming direct regression models and the ResNet50 baseline. Brightness and contrast perturbation experiments further validate the robustness advantage of the proposed method. At a brightness factor of 3.0, the proposed method achieves a Mean Absolute Error of 0.1469 ml/L, while that of the pure EfficientNet model rises to 0.1891 ml/L, indicating significantly slower performance degradation for our method. This method relies solely on single-frame image input without requiring additional optical equipment, providing an effective technical solution for non-contact blood con-centration detection in endoscopic scenarios that balances both accuracy and robustness.