STATS 605: Advanced Topics in Modeling and Data Analysis

Winter 2013

High Dimensional Statistics

Class Information

Instructor Information





Class Information

  1. Days & Time: Mondays and Wednesdays, 4-5:30pm
  2. Location: 130 Dennison
  3. Description: This advanced level graduate course will focus on computational and statistical issues that arise when dealing with high dimensional problems (roughly speaking, problems where the number of parameters to be estimated exceeds the number of samples available) in regression, classification, matrix completion, covariance estimation, clustering, and graphical model estimation. Such problems arise in diverse application areas such as genomics, finance, neuroscience, tomography, climatology, and recommender systems. We will also explore connections to concentration inequalities, random matrix theory, and convex geometry.
  4. Textbook: No textbook is required for this course.
  5. Ctools: You should access the Ctools class page for this course frequently. It will contain important announcements, posted homework assignments, archive of emails to the class, etc.

Instructor Information

Name: Ambuj Tewari

Office: 454 West Hall

Office Hours: By appointment



The final grade in the course will be determined by your scores in 4 homeworks and one final project using the weights given below.

  1. Homeworks (60%): There will be 4 equally weighted homework assignments. The problems on the homeworks will be a mix of theoretical and programming problems. There is no “official” programming language for the course. Knowledge of either Matlab or R should be sufficient to solve problems.
  2. Final Project (40%): An important component of the course is a final project which can either be:
  1. a survey of some actively developing sub-topic within high dimensional statistics, or
  2. a research project involving contributing novel research (theoretical result, statistical method, computational algorithm, or interesting application) to the area of high dimensional statistics
  1. Project credit breakup: Surveys have to be written individually. However, teams of up to 2 students can be formed for a research project. To get full credit, surveys have to be very high quality: they should be similar to a publishable survey article in a top journal. The bar for research projects will be lower because of the time constraint and the inherent uncertainty in the research process. While you’re not required to deliver publication quality research work by the end of the semester, you’re encouraged to do so. I will provide some suggestions for research projects but you should feel free to work on any problem that interests you. The 40% credit for the project will be split as follows:
  1. Project proposal (5%)
  2. Project presentation in class (10%)
  3. Project final report (25%)


A tentative list of topics is:

  1. Regression: Lasso, group lasso, theoretical guarantees, optimization algorithms, compressed sensing, phase transitions
  2. Classification: Performance in high dimensions of one or more of the following methods: Fisher linear discriminant, Bayes classifiers, distance-based classifiers, regularized loss minimization based classifiers
  3. Matrix completion: Trace norm regularization, Non-commutative versions of concentration inequalities (Hoeffding’s, Bernstein’s)
  4. Clustering: k-means, subspace clustering
  5. Large covariance matrices: PCA, covariance estimation, precision matrix estimation
  6. High dimensional graphical models: Ising models and Gaussian graphical models


Lecture number



Jan 9



Jan 14

  1. Course logistics
  2. Linear model with n << p, sparsity
  3. Lasso


Jan 16

  1. l-1 regularized logistic regression
  2. GLMs with l-1 regularization
  3. l-1 SVMs, margins
  4. Matrix completion and trace norm

Jan 21

  1. NO CLASS (MLK Jr. Day)


Jan 23

  1. 1-bit matrix completion
  2. Non-negative matrix completion
  3. Covariance and concentration/precision matrix estimation


Jan 28

  1. Gaussian graphical model selection using l_1 regularized log det
  2. Gaussian graphical model selection using parallel l_1 regularized regressions


Jan 30

  1. High dimensional Ising model selection
  2. Sparse PCA: ScoTLASS, SDP relaxation of l_0 constrained PCA


Feb 4

  1. Sparse PCA: Generalized power method


Feb 6

  1. High dimensional k-means


Feb 11

  1. Sparse subspace clustering
  2. HW 1 out


Feb 13

  1. Lasso l_2 error bounds for the linear model case
  2. Restricted eigenvalue (RE) condition
  3. Initial project proposals due (deadline extended till Feb 15)


Feb 18

  1. Sup (i.e. l_\infty) norm error bounds and sign consistency of lasso
  2. Mutual incoherence condition


Feb 20

  1. Sup norm error bound continued
  2. When does the RE condition hold with high probability?


Feb 25

  1. Proximal methods
  2. Examples of prox operators
  3. HW 1 due


Feb 27

  1. Convergence rates for proximal methods
  2. Final project proposals due

Mar 4

  1. NO CLASS (Spring break)

Mar 6

  1. NO CLASS (Spring break)


Mar 11

  1. Coordinate descent methods
  2. HW 2 out


Mar 13

  1. Least Angle Regression (LARS)


Mar 18

  1. LARS: Lasso modification
  2. Estimation of high dimensional low rank matrices


Mar 20

  1. Estimation of low rank matrices: Decomposition lemma for error matrix


Mar 25

  1. Estimation of low rank matrices: Restricted Strong Convexity (RSC)
  2. HW 2 due
  3. HW 3 out

Mar 27

  2. Attend Prof. Andrew Gelman’s talk: “Causality and Statistical Learning” in the Ford School of Public Policy


Apr 1

  1. Bound on the maximum singular value of a matrix with iid (multivariate normal) rows
  2. Gordon’s Theorem
  3. HW 3 due


Apr 3

  1. Proof of Gordon’s theorem using Slepian’s inequality
  2. Gaussian concentration inequality for Lipschitz functions
  3. HW 4 out


Apr 8

  1. Fisher’s LDA in high dimensions


Apr 10

  1. Naive Bayes or Independence Rule in high dimensions
  2. HW 4 due


Apr 15

  1. Loss based classification in high dimensions

Apr 17

  1. Project presentations I
  1. Hossein
  2. Yuan
  3. Can
  4. Robert
  5. Phoenix

Apr 22

  1. Project presentations II
  1. Kam, Yiwei
  2. Sougata
  3. Naveen
  4. Chia Chye Yee
  5. Xuan

Apr 26

  1. Project reports due



  1. Peter Bühlmann and Sara Van De Geer. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, 2011.
  2. Tony Cai and Xiaotong Shen. High-dimensional data analysis. World Scientific, 2010.
  3. Yonina C. Eldar and Gitta Kutyniok. Compressed Sensing: Theory and Applications. Cambridge University Press, 2012.


  1. Sahand N. Negahban, Pradeep Ravikumar, Martin J. Wainwright, and Bin Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statist. Sci. Volume 27, Number 4 (2012), 538-557.
  2. Wasserman, Larry. "Low Assumptions, High Dimensions." Rationality, Markets and Morals 2.49 (2011).
  3. Bruckstein, Alfred M., David L. Donoho, and Michael Elad. "From sparse solutions of systems of equations to sparse modeling of signals and images." SIAM review 51.1 (2009): 34-81.
  4. Johnstone, I. M. (2007). High dimensional statistical inference and random matrices. In International Congress of Mathematicians I 307–333. Eur. Math. Soc., Zürich.
  5. Donoho, David L. "High-dimensional data analysis: The curses and blessings of dimensionality." AMS Math Challenges Lecture (2000): 1-32.


  1. Bickel, Peter J., Ya’acov Ritov, and Alexandre B. Tsybakov. "Simultaneous analysis of Lasso and Dantzig selector." The Annals of Statistics 37.4 (2009): 1705-1732.
  2. Van de Geer, Sara A. "High-dimensional generalized linear models and the Lasso." The Annals of Statistics 36.2 (2008): 614-645.
  3. Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society. Series B (Methodological) (1996): 267-288.


  1. Fan, J, Fan, Y. and Wu, Y. (2011). High-dimensional classification. In High-dimensional Data Analysis. (Cai, T.T. and Shen, X., eds.), 3-37, World Scientific, New Jersey.
  2. Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. "1-norm support vector machines." Advances in neural information processing systems 16.1 (2004): 49-56.
  3. Bradley, Paul S., and Olvi L. Mangasarian. "Feature selection via concave minimization and support vector machines." Machine Learning Proceedings of the Fifteenth International Conference (ICML’98). 1998.

Low Rank Matrix Estimation, Matrix Completion

  1. Davenport, M. A., Plan, Y., Berg, E. V. D., & Wootters, M. (2012). 1-Bit Matrix Completion. arXiv preprint arXiv:1209.3672.
  2. Negahban, S., & Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 98888, 1665-1697.
  3. Negahban, S., & Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics,39(2), 1069-1097.
  4. Candès, Emmanuel J., and Terence Tao. "The power of convex relaxation: Near-optimal matrix completion." Information Theory, IEEE Transactions on 56.5 (2010): 2053-2080.

Covariance Estimation

  1. Rothman, A. J., Bickel, P. J., Levina, E., & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494-515.
  2. Bickel, P. J., & Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1), 199-227.

Sparse PCA

  1. Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11, 517-553.
  2. d'Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM review, 49(3), 434-448.
  3. Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of computational and graphical statistics, 15(2), 265-286.

Graphical Models

  1. Ravikumar, P., Wainwright, M. J., Raskutti, G., & Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935-980.
  2. Ravikumar, Pradeep, Martin J. Wainwright, and John D. Lafferty. "High-dimensional Ising model selection using ℓ1-regularized logistic regression." The Annals of Statistics 38.3 (2010): 1287-1319.
  3. Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
  4. Meinshausen, Nicolai, and Peter Bühlmann. "High-dimensional graphs and variable selection with the lasso." The Annals of Statistics 34.3 (2006): 1436-1462.


  1. Bouveyron, C., & Brunet-Saumard, C. (2012). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis.
  2. Sun, W., Wang, J., & Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6, 148-167.
  3. Vidal, R. (2011). Subspace clustering. Signal Processing Magazine, IEEE,28(2), 52-68.
  4. Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.


  1. F. Bach, R. Jenatton, J. Mairal, G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.
  2. Agarwal, A., Negahban, S. N., & Wainwright, M. J. (2011). Fast global convergence of gradient methods for high-dimensional statistical recovery.arXiv preprint arXiv:1104.4824.