Abstract:Previous point-wise methods are suffering from time consumption and limited receptive fields to capture information among points. To address these limitations, we propose the cosh-attention, which reduces the computation complexity of space and time from the quadratic order to linear order with respect to the number of points. In the cosh-attention, the traditional softmax operator is replaced by non-negative ReLU activation and hyperbolic-cosine-based operator with re-weighting mechanism. Then based on the key component, cosh-attention, we present a two-stage hyperbolic cosine transformer (ChTR3D) for 3D object detection from point clouds. It refines proposals by applying cosh-attention in linear computation complexity to encode rich contextual relationships among points. Extensive experiments on the widely used KITTI dataset and Waymo Open Dataset demonstrate that compared with vanilla attention, the cosh-attention significantly improves the inference speed with competitive performance. Among two-stage state-of-the-art methods using point-level features for refinement, the proposed ChTR3D is the fastest one.