Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Overall, the paper is well written and self-contained, providing a nice introduction to SC and CSC. References are also appropriate. The topic of sparse representations for image denoising is relevant. Despite these good aspects, the novelty of the paper is very marginal and the main claim that connects PA SC with MMSE estimation with sparse priors is vague and not well justified. The network architecture proposed does not differ from previous LISTA-style methods and the only difference is essentially in the way you input the data. Specific comments: - In the paper it is argued that pre-processing is bad. It is not clear what this refers to. The authors seem to criticize mean (or smooth component) subtraction before sparse coding. Mean needs to be subtracted for sparse coding to work as also argued in the paper. Furthermore, in the experimental section there is a mention to debiasing the signals. Does this refer to mean subtraction? That contradicts the argument in the paper that SC shouldn't need pre-processing (i.e., smooth component removal). Also, it is not clear whether normalization is applied to the patches before sparse coding. This can be an advantage with respect to the convolutional model since it gives robustness to scale variation. - What is the double == in (6)? - When you extract every patch in an image and do sparse coding, that is essentially equivalent to the convolutional case. The difference comes in the way you combine the patches back (patch extraction operator and its adjoint). What the authors propose by using strided convolutions is effectively trying to do patch averaging with partial overlap. The results only marginally improve PA which could also be related to the way boundaries are handled. - How do you choose the sparsity level, the noise level is the same? Then global is better due to averaging in high dimensions. The opposite is claimed in the paper. -  already proposed a LISTA version of the CSC model. What is the difference? It seems the only difference is the "multi-channel" decomposition of the image. - A is convolution, B deconvolution??? - How is the duplicated image a shifted version of the original one? - I don’t see the point of mentioning batch normalization as possible improvement and not having tried it. There is no guarantee that that will be the case. - The experimental section needs further clarification in terms of number of atoms used in both cases, pre-processing steps, and algorithmic solution (e.g., Lagrangian vs L1 minimization).
I have no further comments on this paper. I think this is a very good paper.
The paper discusses two important lines of works that appeared ten years ago and have become ubiquitous in inverse problems. On one side, the dictionary learning strategy, based on patch sparse coding and then averaging. On the other side, the CSC which is based on convolution filters. A unified presentation of both worlds allows authors to explain the limits of both techniques, and to propose a new CNN with improved performance. The paper is well written and convincing, and I have only few comments: * in equation (7), I did not understand the (0,infinity) norm notation. I would say it means the level of group-sparsity, but this does not match the description "local non-zero elements". Please clarify this by providing the true definition. * The references should be polished (capital letters are often missing in the titles) * line 32: "cardinality" is often referred to as "sparsity level" * line 64: "show" -> shown