Abstract:Multi-view 3D reconstruction is often compromised by background clutter in individual images. Achieving consistent object masks across views is crucial to mitigate this, yet existing methods typically rely on manual annotation or depth map priors, limiting their practicality. We propose a novel framework for consistent multi-view object segmentation, which draws inspiration from salient object detection but enforces geometric consistency across views. Our framework automatically selects an optimal global object mask from SAM-generated candidates by evaluating both intra-view quality and cross-view cycle consistency. This mask is then propagated to all other views via robust feature matching and prompt-based refinement. Extensive experiments on the DTU, Tanks and Temples, and BlendMVS datasets demonstrate that our method significantly improves segmentation consistency and, when seamlessly integrated with 3D Gaussian Splatting, enhances the quality and robustness of downstream 3D reconstruction.