Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors present the first domain adaptation model for 3D point clouds. They come up with novel structures and modules to create their model. For this, they build on a variety of known and novel techniques, for example they use a PointNet++ encoder, but also introduce novel Self-Adaptive nodes and use a convolution similar to bilateral convolution (they call it deformable convolution) to extract features for domain alignment. The submission seems technically sound and the authors provide a theoretical analysis for their method in terms of the H\Delta H theory. Since I am not an expert in domain adaptation, I did not find conclusive judgement of their contribution on that end. The paper is clearly written, though I noticed a substantial amount of writing mistakes w.r.t. articles (the, a). The presented method achieves clearly better results than other methods undergoing domain transfer without adaptation. It would be interesting, though, to see the result of other methods fine-tuned with a small amount of labelled data to get an impression of the complexity of the domain transfer task between the different datasets. Also, even though there is an ablation study performed for the different proposed parts of the architecture, there is no discussion of the weaknesses of the method, which would be helpful. The approach together with the newly proposed dataset, could be a valuable contribution for the community.
# Originality The submission mostly combines the domain adaption loss Maximum Classifier Discrepancy  with additional learned local features "Local Feature Alignment" to point cloud classification tasks with unlabeled target domain samples. The driving classification architecture is borrowed from PointNet. The main contribution here lies in the local features that bring up the predictive performance on the target task: small regions, which are centered around sample point cloud points, are first moved with a learned offset (to better support commonalities of the current object) and then weighted by an attention network (to identify important features). The features of these regions are derived from early stages of a PointNet architecture. The final local features are then fed into later layers of a PointNet architecture for classification. The training is done by alternating the training steps from the publication of Maximum Classifier Discrepancy . # Quality The ablation study shows that on average across multiple domain adaptation tasks the added adaptable local features seem to improve over a direct application of general-purpose domain adaptation techniques. However, the effect on different classes seems to vary. # Clarity The description of the architecture and methodology are clear enough. # Significance The contribution -- though successful -- might be of limited significance to the community for mostly two reasons: the derived local feature alignment seems to be mostly a learned weighting and offseting of PointNet features, and the success across classes as shown in table 3 seems noisy; some classes profit from the proposed method (e.g., cabinet) and some don't (e.g., lamp). Minor fixes: - line 25: systems? - line 156, eq 2: Maybe rewriting the equation in the style of an assignment would make sense here? - line 212, eq 10: missing closing parenthesis for h_1(x)? - table 3: MCD and table: Probably remove '1c' here?
Originality: - L3: “to the best of our knowledge, there is no method yet to achieve domain adaptation on 3D data, especially point cloud data” see below [SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud, Wu et al 2018] proposes a domain adaptation pipeline for 3D lidar point cloud to reduce distribution gap between synthetic and real data. [Domain Adaptation for Vehicle Detection from Bird’s Eye View LiDAR Point Cloud Data, Saleh et al 2019] This is quite recent, but also explore domain adaptation for synthetic vs real data. Technically the above two are operating in image space (depth semantic segmentation maps, and BEV of point cloud, respectively) but the underlying goal is still to model 3D information from point cloud. The first paper is from 2018 so I do think this paper over claims this ‘first to do domain adaptation in 3D data’ statement a bit. Although it’s worth noting that this paper explores point based representation rather than image based, and for classification task rather than point segmentation. But I think the similarity and different should be mentioned and discussed. + The idea of locally aligning feature using self adaptive node with adaptive part based receptive field is pretty novel and interesting. This provides additional structure to the feature that would make the global alignment easier since it is invariance to scale and part configuration. But while this would work for classification, I’m not sure if it would work for other shape sensitive tasks such as 3D detection? - In fact, Table 3 shows that adding local alignment using self adaptive node doesn’t always lead to an improvement over other baselines on all the class. - Global alignment is a feature alignment from based on MCD from , so this part is not new. +/- I think it’s really useful to have a benchmark dataset for domain adaptation, and I appreciate the authors taking initiative and assemble such dataset. But since this is simply a subset of existing data, I don’t think this is a strong contribution (which to be fair, the authors never claim it is) Quality: + Extensive experiments and ablation study with detailed comparison with other UDA baselines. This is really useful especially since the proposed benchmark is new. + The ablation study shows that each component really do add to the performance (Table 2) ? Does the proposed approach with only global alignment equivalent to the MCD baseline from ? I assume so since there is no ablation study with only G and G is based on MCD, but are they all using the same setting, parameters, etc? + Performance break down per class in Table 3 is a nice touch. This is very useful since it shows strength and weakness of each approach. - All the scores in Table 3 (Avg) is lower than their Table 2 counterpart, which makes me wonder if the imbalance nature of the data across categories has more effect than it should be. Chair and table sort of dominate the dataset, and skew the final score toward the trend of these two classes. I feel like a more fair comparison is when all class has an equal number of object or when each class is weighted equally. I know that this is pretty common in classification task, but it can be misleading. ? The result for bed is very interesting and worth a discussion. MCD  outperform other methods by a large margin. And if we assume that the proposed approach with only G is the same as MCD, then adding local alignment drop the classification score from 26.1 to 4.3 (and adding attention further drop them to 1) Do you have any intuition on why this is the case? Clarity: + Overall the paper is not difficult to understand, + The format of the experiments, ablation study, and the tables to show the results are all very clear and easy to digest. ? I feel like section 3.5 doesn’t add much to the narrative and could be put in the supplementary instead. - It’s not immediately clear to me in line 90-91 that P(s) P(t) refers to the distribution (it’s defined in the next section) Significant: + I believe the idea of self adaptive node for 3D object would be useful to the research community, if it works. Aligning feature might not be new, but doing so in 3D setting and on top of PointNet based feature shows that it is possible and promising, at least for chair and table categories. + It’s true that not much works are looking into Domain Adaptation for 3D data, and it helps to have a common benchmark even if it just a combination of existing dataset. --UPDATED AFTER REBUTTAL-- Thanks for the detailed rebuttal. The additional results are quite interesting and further convince me that the proposed local alignment does help. So I'm keeping my score at 6.