NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 2065 Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

### Reviewer 1

The paper studies LQR synthesis for LTI systems with unknown dynamics matrices. The central element of the paper is a (novel) algorithm that utilizes a convex optimization approach (the so-called System Level Synthesis approach, SLS) for synthesizing LQR controllers using estimated dynamics models. The SLS approach allows for an analysis of how the error in the matrix estimation affects the regret of the LQR controller. Using this controller synthesis, upper bounds on the estimation error of the dynamics matrices as well as upper and lower bounds for the expected loss are provided. The method is compared to existing approaches on a benchmark system. This computational study shows a comparable performance of all methods, with the presented method giving the nicest theoretical guarantees (e.g. stability). Quality: The paper is well and carefully written. It takes a very formal approach to the problem and presents an extensive mathematical analysis. Various mathematical concepts and approximation bounds are used to derive the insightful bounds. Unfortunately, I did not get a good intuition on the overall argument leading to the result and I would appreciate some more informal explanations on how the bounds are derived. Some aspects are confusing in the paper are confusing. For example, Algorithm 1 requires to specify failure probability $\delta$. However, this probability seems to be never used. Such details should be clarified. Clarity: The paper is carefully written. However, clarity is a major concern about the paper. I appreciate the authors' effort to move the math heavy analysis to the appendix. However, the main body cannot be read without the appendix. For example, the semidefinite optimization problem for the controller synthesis appears only in the pseudo-code of Algorithm 1. The reader needs to read the appendix, to understand what happens here. It would be nice to have a brief discussion of the optimization problem in Section 2. I would prefer such a discussion over the current - very shortened - sketch of System Level Synthesis. Although the general argumentation of the paper is clear, a outline of the overall proof is missing. The theoretical analysis is fairly heavy, using many approximations and bounds. The paper would become much better accessible for a reader, if the mathematical steps leading to the result were informally described initially. At some points in the long analysis, the notation is not carefully enough introduced. For example, in Theorem B1 $\frac{R}$ is used without being introduced. Also the notation $\frac{R}(1 : F)$ in proof of Lemma B.2 is not explained. As another example, in line 386 "e" is introduced but not used later on (might be a typo). Such details make it unnecessary hard for the reader to follow the paper. Overall, the paper is very long and dense. It might be beneficial for the readability of the paper to further limit the scope of the analysis and the discussion. Originality: The paper studies an interesting problem and provides nice theoretical results. I like use of robust control synthesis for building a robust adaptive control scheme. Without claiming to know all relevant literature, I found the derived bounds insightful and interesting. Significance: The paper makes an interesting theoretical contribution by providing an algorithm for the adaptive LQR problem with transparent theoretical guarantees. It is hard for me to judge the practical relevance of the method. Although the proposed method is polynomial time (using semidefinite programming), the optimization might still be computationally demanding. Further, the algorithms has several parameters, whose influence remains unclear. This makes a judgement on the ease of use for practical problems difficult. Further, the theoretical analysis is very specific for this problem. As a general proof outline is somehow missing, it remains difficult to see how much of the analysis can transfer to other problems. In summary, the contributions are on a very specific topic. Minor Comments - line 103 thereore - therefore - line 383 problemmmm - Line 399: The forward reference to Appendix F is not nice. Can you make a preliminaries section for the Appendix? - Algorithm 1: What is ||K_*|| exactly? Is this an arbitrary constant or does the algorithm require to know the norm of the optimal controller correctly?