NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona
Paper ID:1507
Title:Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation

Reviewer 1

Summary

The paper proposes a new deep network for 3D biomedical image segmentation. Their model is based on two novel contributions, kU-Net: a simple U-Net model replicated ‘k’ times, with each iteration of k, the network sees larger and larger regions. It exploits multiscale context for refining segmentation. BDC-LSTM: arguably, this is a simple extension of convolutional LSTMs. Instead of traditional LSTM which carries context across time (in forward direction), they look both forward and back with respect to z-axis (the previous and the next image slice), and average the outputs. Their novel improvements to the existing segmentation and LSTM techniques produces a reliable path to exploiting context, which seems to be crucial information in biomedical image segmentation domain. Their model beats state-of-the-art techniques on the ISBI neuronal segmentation challenge. Although state of the art, their method seems to be computationally expensive. For segmenting a single 2D image, the kU-Net needs multiple forward passes through a U-Net style network. It would be good to thoroughly benchmark their model. The U-Net model was also tested on the ISBI cell tracking challenge, it would be a good dataset to evaluate on.

Qualitative Assessment

This is an excellent application of deep learning to medical image analysis. Moreover, the authors expand on the existing techniques and propose novel extensions to LSTM and use of context.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)


Reviewer 2

Summary

This paper proposes a new model for biomedical image segmentation, which extends existing works in two distinct way: First, it generalizes the well-known U-Net architecture to a multi-scale setting (called kU-net), while remaining fundamentally within a 2D fully-convolutional scheme. Second, it combines several such kU-Net layers into a bi-directional convolutional LSTM (called BDC-LSTM), which can account for context given by the third dimension, while elegantly handling the common anisotropy of biomedical stacks (wherein the z-dimension is sampled at a lower spatial resolution than the xy-plane).

Qualitative Assessment

The paper is generally well written and easy to understand. I quite like the proposed model: kU-net provides an answer to the ability to capture multi-scale features within a medical image, and the bi-directional LSTM scheme is an elegant way to account for broader context from the z-dimention. However, I offer a few reservations to the paper as it currently stands. Standard ways of dealing with anisotropy include resampling (e.g. trilinear) to isotropic voxels, or of using non-cubic kernels in fully convolutional 3D models [1,2]. For datasets in which the across-plane resolution is reasonably close to the within-plane one (e.g. fungus), this standard preprocessing would have been expected as a baseline. For models in which the across-plane resolution is much less than the within-plane one (e.g. Neuron), one can expect the impact of across-plane z-context to be small compared to the within-plane influence. It may be for this reason that the performance improvements of using RNNs for this dataset are so small. This might warrant a more extended discussion. Much case is made of the fact that the model can handle anisotropic images; however, the model as presented can only deal with training and testing images that all exhibit the same sampling resolution. The real problem with anisotropy is that datasets are acquired on heterogenous equipment (e.g. scanners from different manufacturers) such that all images are collected at different resolution. As noted, current practice is to carry out a crude but effective resampling to a common resolution; however, it would be really nice to be able to have the model directly handle this heterogeneity. The model’s multi-scale ability arise from a single downsampling (by a factor of 2) in the kU-net layer, with the concatenation of features at the two levels. Much simpler yet effective mechanisms have been proposed (e.g. [3]), and it would have been nice to understand whether they could serve the same purpose within the studied context. The experimental results appear fairly limited. Although the proposed models seem to show improvements to commonly-reported metrics, the evaluation is limited to two very small datasets, and give no hint at how the model would perform on more general tasks (e.g. on anatomical data such as BRATS). Moreover, the improvements appear relatively small (especially for Neuron dataset) given the enormous complexity of the model compared to baseline U-Net (both in terms of architecture and number of parameters). This at least warrants a discussion of whether the additional effort is really worth it. Detailed Comments: - Line 85: this arrangement is also similar to [3] - Line 164-165: the argument that BDC-LSTM is more efficient at exploiting inter-splice contexts than Pyramid-LSTM needs to be better explained and substantiated. - Line 186: define deconvolution: is it simple upsampling or a real deconvolution? - Lines 211-212: it is not clear from the paper if decoupled training is indeed used in the experiments. What are the implications for model initialization? Line 330: lstm ==> LSTM Line 332: lstm ==> LSTM References [1] Dou, Q., Chen, H., Yu, L., Zhao, L., Qin, J., Wang, D., ... & Heng, P. A. (2016). Automatic Detection of Cerebral Microbleeds From MR Images via 3D Convolutional Neural Networks. IEEE transactions on medical imaging,35(5), 1182-1195. [2] A. A. A. Setio et al. (2016). Pulmonary Nodule Detection in CT Images: False Positive Reduction Using Multi-View Convolutional Networks. in IEEE Transactions on Medical Imaging, 35(5), pp. 1160-1169. [3] Shen, W., Zhou, M., Yang, F., Yang, C., & Tian, J. (2015, June). Multi-scale convolutional neural networks for lung nodule classification. In Information Processing in Medical Imaging (pp. 588-599). Springer International Publishing.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)


Reviewer 3

Summary

This is a well composed application paper that combines FCN [16] with C-LSTM [17] for semantic segmentation of anisotropic images.

Qualitative Assessment

The paper is well written. The methods are well explained. Given that this is an application paper, the experimental validation is rather weak, only two datasets are used (one public, one in-house). The improvement on EM data (V_rand) is of 0.21% for FCN and 0.25% for FCN + LSTM w.r.t [16]. That gives an improvement of 0.04% when adding a LSTM on top of FCN. It is not clear if the improvement comes from the model itself or it is simply a result of higher model capacity. It would be nice to see the number of parameters together with results in Table 1. The method described in [3] should also appear in Table 1. It would be nice to see the impact of number of time steps included in C-LSTM on the model performance. Authors say that: “incorporating 3D convolutions may incur extremely high computation costs”. What is the computation cost of the proposed method? How does it compare to 3D convolutional approach? At test time, do the authors follow the scheme of multiple predictions by image rotation (introduced in [16]) or perform some kind of model averaging? Minor comments: - Figure 3. Subfigures (b) and (c) don’t seem to add new value to the content of the paper. - Figure 4 (c). How would the authors explain the lack of smoothness for C-LSTM result (comparing to Figure 4 (b)).

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)


Reviewer 4

Summary

This paper introduced a deep learning framework based on the combination of fully convolutional networks to exploit 2D contexts and a RNN to integrate contextual information along the third direction. Experimental results on two 3D medical applications demonstrated good performance.

Qualitative Assessment

This paper aims to tackling the task of volumetric image segmentation by integrating 3D contextual information. Specifically, a hybrid framework with 2D fully convolutional networks and a recurrent neural network for exploiting intra- and inter-slice contexts, respectively. This paper is well written and the method was validated on two datasets, including one public on-going challenge dataset and one in-house fungus dataset. Overall, in my opinion, this paper is interesting and could attract the interest of researchers working on volumetric image segmentation related problems. However, I have following comments for further revision: 1. In introduction, the authors listed four-category DL schemes and problem of anisotropic resolutions that current deep learning based methods could cause. Please explain the issue for the 3D convolutions with isotropic kernels in detail, i.e., why it's not suitable for handling anisotropic data? 2. In Table 1, the difference among different methods seem to be marginal values. Are the improvements significant? Besides, the leading methods in the ISBI challenge ranking board could be added into the Table for comprehensive evaluation. 3. How do choose the value of k in kU-Net? If the value is set as 3, will the performance be further improved? 4. How long does it take to process one testing image? In real practice, the data of volumetric dataset can be extremely large, any ideas for further speeding up in the future work?

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)


Reviewer 5

Summary

This paper presents a deep learning framework for 3D biomedical image segmentation. It combines a fully convolutional network (FCN) and a bi-directional convolutional long short-term memory (BDC-LSTM) network, which are used to model the intra-slice and inter-slice contexts, respectively. The proposed framework is tested on 3D neuron and fungus image datasets. The experiments demonstrate that it can provide promising segmentation performance.

Qualitative Assessment

1. The paper introduces an interesting idea to incorporate inter-slice contexts into the framework for 3D image segmentation, and the experiments are relatively sufficient. Meanwhile, the paper is well organized, and the concepts are well presented and easy to follow. 2. One concern is whether the segmentation using the proposed method is significantly better than those from recently reported methods. For the ISBI 2012 challenge neuron dataset, many state-of-the-art segmentation approaches are presented in the literature, and some of them provide better performance than those reported in this paper (if the dataset and evaluation criteria are the same). Some results are summarized in the following publication: Fakhry et al. “Deep models for brain EM image segmentation: novel insights and improved performance”, Bioinformatics, 2016

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)