Abstract:To address the loss of high-frequency details in 3D Gaussian Splatting (3DGS)-based avatars, we propose HiFi-GaussianAvatar, a high-fidelity reconstruction framework that introduces MSA-StyleUNet, a multi-scale at-tention network for refining 3D Gaussian parameters. Our method first extracts a parametric template from mul-ti-view inputs and predicts pose-conditioned Gaussian features. The MSA-StyleUNet applies multi-scale attention and multi-frequency sine activations to enhance spectral representation, allowing the network to capture fine-scale geometric and appearance details while maintaining training stability. Experiments demonstrate that our approach produces pose-controllable avatars with significantly improved fidelity and detail reconstruction, outperforming existing 3DGS-based methods.