NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6717
Title:R2D2: Reliable and Repeatable Detector and Descriptor

Reviewer 1

Strengths: Novel idea, promising results, overall clear written. Experiments: Authors provided detailed evaluation of the method and performed ablation study on the influence of separately repeatability and matching reliability, experiments with different transformations. Based on the novelty of the approach, performance results and evaluation, I recommend paper to be accepted.

Reviewer 2

Although the approach is somewhat incremental, I believe this is an interesting contribution for a fundamental task. The paper is well presented and experimental results are convincing. In my opinion the main technical contribution of the paper is given by the fact that the proposed network is able to estimate (at the same time) a repeatability map as well as a local descriptors associated with a discriminativeness confidence map. This leads to descriptors that can be accurately matched with high confidence. This result is obtained by relying on a metric learning procedure based on approximated average precision, which seems novel.

Reviewer 3

Novelty over [8,10] is limited. They too use a single backbone to learn a detector and descriptor which influence each other. Losses for repeatability and reliability are interesting, though. In balance, those are sufficient advances for the paper to add to our knowledge of keypoint detection and description. The "peakiness over patches" objective in Sec 3.1 is reminiscent of bucketing in SFM, where it has empirically been known to ensure a good distribution of keypoints for accurate pose estimation (for example, see "Visual odometry" by Nister, or ORB-SLAM). There might be a connection to explore here, or to state in a discussion. In Fig 2, use “confidence”, not “confidency”. Perhaps notation clarity: S’_U is the saliency map in I’ warped by the inverse of U. A dense descriptor is leaned by [6] using contrastive loss, not triplet loss as stated in Sec 3.2.