## Mini-Course: Selected Topics in Theoretical Machine Learning

In this course, I will present some recent papers on theory, optimization and machine learning. It contains 6 lectures, 75 minutes each.

In each lecture, I will be mainly talking about one paper, then briefly talking about other related papers.

Wednesday 4:15pm-5:30pm

Rhodes 253 (NOT Gates 416)

https://piazza.com/cornell/fall2017/cs8999/home

Algebra, Probability theory

[08/30/2017] How stochastic gradient descent (efficiently) escapes saddle points. [1,2] [Slides]

[09/06/2017] Generalization of SGD algorithms, escaping local minima. [3,4] [Slides]

[09/13/2017] Convergence analysis on neural network: deep linear residual network, and two layer network convergence analysis. [5,6] [Slides]

[09/20/2017] Cancelled due to conflict with Stats seminar.

[09/27/2017] Matrix Completion has No Spurious Local Minimum [7] [Slides]

[10/04/2017] Hyperparameter tuning algorithms and analysis, Hyperband [8] [Slides]

[10/11/2017] Hyperparameter tuning algorithms and analysis, Harmonica [9] [Slides]

[1] Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition, Rong Ge, Furong Huang, Chi Jin, Yang Yuan, Colt 2015.

[2] How to Escape Saddle Points Efficiently, Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan, 2017.

[3] Train faster, generalize better: Stability of stochastic gradient descent, Moritz Hardt, Benjamin Recht, Yoram Singer, ICML 2016.

[4] A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics, Yuchen Zhang, Percy Liang, Moses Charikar, COLT 2017.

[5] Identity Matters in Deep Learning, Moritz Hardt, Tengyu Ma, ICLR 2017.

[6] Convergence Analysis of Two-layer Neural Networks with ReLU Activation, Yuanzhi Li, Yang Yuan, 2017.

[7] Matrix Completion has No Spurious Local Minimum, Rong Ge, Jason D. Lee, Tengyu Ma, NIPS 2017.

[8] Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization, Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar, ICLR 2017.

[9] Hyperparameter Optimization: A Spectral Approach, Elad Hazan, Adam Klivans, Yang Yuan, 2017.

In each lecture, I will be mainly talking about one paper, then briefly talking about other related papers.

**Time**Wednesday 4:15pm-5:30pm

**Location**Rhodes 253 (NOT Gates 416)

**Piazza**https://piazza.com/cornell/fall2017/cs8999/home

**Prerequisites**Algebra, Probability theory

**Schedule**[08/30/2017] How stochastic gradient descent (efficiently) escapes saddle points. [1,2] [Slides]

[09/06/2017] Generalization of SGD algorithms, escaping local minima. [3,4] [Slides]

[09/13/2017] Convergence analysis on neural network: deep linear residual network, and two layer network convergence analysis. [5,6] [Slides]

[09/20/2017] Cancelled due to conflict with Stats seminar.

[09/27/2017] Matrix Completion has No Spurious Local Minimum [7] [Slides]

[10/04/2017] Hyperparameter tuning algorithms and analysis, Hyperband [8] [Slides]

[10/11/2017] Hyperparameter tuning algorithms and analysis, Harmonica [9] [Slides]

**References**[1] Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition, Rong Ge, Furong Huang, Chi Jin, Yang Yuan, Colt 2015.

[2] How to Escape Saddle Points Efficiently, Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan, 2017.

[3] Train faster, generalize better: Stability of stochastic gradient descent, Moritz Hardt, Benjamin Recht, Yoram Singer, ICML 2016.

[4] A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics, Yuchen Zhang, Percy Liang, Moses Charikar, COLT 2017.

[5] Identity Matters in Deep Learning, Moritz Hardt, Tengyu Ma, ICLR 2017.

[6] Convergence Analysis of Two-layer Neural Networks with ReLU Activation, Yuanzhi Li, Yang Yuan, 2017.

[7] Matrix Completion has No Spurious Local Minimum, Rong Ge, Jason D. Lee, Tengyu Ma, NIPS 2017.

[8] Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization, Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar, ICLR 2017.

[9] Hyperparameter Optimization: A Spectral Approach, Elad Hazan, Adam Klivans, Yang Yuan, 2017.