NeurIPS 2019
	Sun Dec 8th through Sat the 14th, 2019  at Vancouver Convention Center
	
	
	
		
		Reviewer 1
		
		The problem of computing the permanent of a non-negative matrix is probably the best known example of a #P complete problem, in addition to being a useful tool in graph theory, statistics,  and increasingly in quantum computation with respect to boson sampling.  The main theoretical contribution of this paper lies in replacing the original workflow of Huber and Law for exact partition sampling - which required an proof in advance that a  particular upper bound satisfied the nesting property - with an adaptive method that works for any upper bound. This is a significant result given the difficulty of applying the original work and the accompanying implementation is likely to become widely used.  The paper is well written and the explanation of the problem setting (existing FPRAS method  that no one actually implements) reflects the current state of the field.  The new ideas presented in Section 3 are intuitive and well motivated. The derivation choices for the specific case of the permanent seem natural, although a comparison of the results using different bounds would be interesting, since the dependence of the run time on choice of bounds is not explored.  The experiments are reasonably comprehensive - larger matrices without the block diagonal structure could have been constructed by taking sequences of combinatorial graphs for which the number of cycle covers is known. While the block diagonal matrices are sparse in the formal sense, it isn't clear that they are representative of non-dense matrices more generally.  Overall, this paper  presents an important new idea and provides more than sufficient validation of its practical implications.   *Smaller Notes  In the introduction, it is the bi-adjacency matrix (where the rows and columns are separately indexed by the partite sets) whose permanent counts perfect matchings.   I found the informal description of the nesting property in the contributions section  a little confusing although the formal definitions are clear.   It would be nice to see Figure 5 from the supplement in the main text but page restrictions may make that difficult.    (very) small note, but many places with multiple references they are out of (numerical) order.   I appreciate the authors responses and understand the computational difficulties involved in actually applying the other types of bounds. In my mind this does not detract from the strength of the paper. 
		
	
		
		Reviewer 2
		
		The authors propose AdaPart, an adaptive partitioning scheme for a rejection sampling strategy for estimating the permanent.  The novelty over existing work is the introduction of a greedy method to build the partition trees on the fly (empirical verification shows that this approach is quite effective in practice) along with an application of a tighter upper bound (which reduces both the practical and theoretical cost of this type of sampling scheme).  The paper is well-written and careful to place its contributions with respect to existing work on permanent approximation.  My only criticism, really, is that the approach seems to be much more applicable than just a strategy for computing the permanent and it would have been nice to see some experiments on different types of partition function estimation tasks.
		
	
		
		Reviewer 3
		
		In overall, I think this paper is both significant and practically useful for estimating the partition function. I enjoyed reading the paper and think it is novel enough to be accepted. Experiments are convincing and well designed.   However, I am slightly concerned about the generality of the algorithm. Although the algorithm was written in a general way, i.e., not tied to the permanent problem and particular choice of bound, only a single choice of upper bound from Soules is considered for the permanent problem. This is potentially problematic since the algorithm might result in too many refinements and inapplicable for other upper bounds, unlike the claims that the authors made.   I also think the writing can be improved. The term "upper bound" and "nested upper bound" was confusing when I was first reading the paper. Specifically,  I would like to suggest clarifying that "it is provided that Z_{w}^{UB} is always an upper bound for Z_{w}", while "it is non-trivial to see that Z_{w}^{UB} is nesting upper bound". Also, I think Figure 1 is slightly confusing since we are discarding the unused partitions as we go down the node hierarchy, but the figure seems to require 2^{n} choice of partitions.  minor:  - I think it would be empirically interesting to see how much the upper bound is likely to fail in nesting (for a single partition). I suspect there might be extreme cases where nesting the upper bound is quite hard than others.