STATS 605, Winter 2013

STATS 605: Advanced Topics in Modeling and Data Analysis

Winter 2013

High Dimensional Statistics

Class Information

Instructor Information

Class Information

Days & Time: Mondays and Wednesdays, 4-5:30pm
Location: 130 Dennison
Description: This advanced level graduate course will focus on computational and statistical issues that arise when dealing with high dimensional problems (roughly speaking, problems where the number of parameters to be estimated exceeds the number of samples available) in regression, classification, matrix completion, covariance estimation, clustering, and graphical model estimation. Such problems arise in diverse application areas such as genomics, finance, neuroscience, tomography, climatology, and recommender systems. We will also explore connections to concentration inequalities, random matrix theory, and convex geometry.
Textbook: No textbook is required for this course.
Ctools: You should access the Ctools class page for this course frequently. It will contain important announcements, posted homework assignments, archive of emails to the class, etc.

Instructor Information

Name: Ambuj Tewari

Office: 454 West Hall

Office Hours: By appointment

Email: tewaria@umich.edu

Grading

The final grade in the course will be determined by your scores in 4 homeworks and one final project using the weights given below.

Homeworks (60%): There will be 4 equally weighted homework assignments. The problems on the homeworks will be a mix of theoretical and programming problems. There is no “official” programming language for the course. Knowledge of either Matlab or R should be sufficient to solve problems.
Final Project (40%): An important component of the course is a final project which can either be:

a survey of some actively developing sub-topic within high dimensional statistics, or
a research project involving contributing novel research (theoretical result, statistical method, computational algorithm, or interesting application) to the area of high dimensional statistics

Project credit breakup: Surveys have to be written individually. However, teams of up to 2 students can be formed for a research project. To get full credit, surveys have to be very high quality: they should be similar to a publishable survey article in a top journal. The bar for research projects will be lower because of the time constraint and the inherent uncertainty in the research process. While you’re not required to deliver publication quality research work by the end of the semester, you’re encouraged to do so. I will provide some suggestions for research projects but you should feel free to work on any problem that interests you. The 40% credit for the project will be split as follows:

Project proposal (5%)
Project presentation in class (10%)
Project final report (25%)

Topics

A tentative list of topics is:

Regression: Lasso, group lasso, theoretical guarantees, optimization algorithms, compressed sensing, phase transitions
Classification: Performance in high dimensions of one or more of the following methods: Fisher linear discriminant, Bayes classifiers, distance-based classifiers, regularized loss minimization based classifiers
Matrix completion: Trace norm regularization, Non-commutative versions of concentration inequalities (Hoeffding’s, Bernstein’s)
Clustering: k-means, subspace clustering
Large covariance matrices: PCA, covariance estimation, precision matrix estimation
High dimensional graphical models: Ising models and Gaussian graphical models

Schedule

Lecture number	Day	Topics
	Jan 9	NO CLASS
1	Jan 14	Course logistics Linear model with n << p, sparsity Lasso
2	Jan 16	l-1 regularized logistic regression GLMs with l-1 regularization l-1 SVMs, margins Matrix completion and trace norm
	Jan 21	NO CLASS (MLK Jr. Day)
3	Jan 23	1-bit matrix completion Non-negative matrix completion Covariance and concentration/precision matrix estimation
4	Jan 28	Gaussian graphical model selection using l_1 regularized log det Gaussian graphical model selection using parallel l_1 regularized regressions
5	Jan 30	High dimensional Ising model selection Sparse PCA: ScoTLASS, SDP relaxation of l_0 constrained PCA
6	Feb 4	Sparse PCA: Generalized power method
7	Feb 6	High dimensional k-means
8	Feb 11	Sparse subspace clustering HW 1 out
9	Feb 13	Lasso l_2 error bounds for the linear model case Restricted eigenvalue (RE) condition Initial project proposals due (deadline extended till Feb 15)
10	Feb 18	Sup (i.e. l_\infty) norm error bounds and sign consistency of lasso Mutual incoherence condition
11	Feb 20	Sup norm error bound continued When does the RE condition hold with high probability?
12	Feb 25	Proximal methods Examples of prox operators HW 1 due
13	Feb 27	Convergence rates for proximal methods Final project proposals due
	Mar 4	NO CLASS (Spring break)
	Mar 6	NO CLASS (Spring break)
14	Mar 11	Coordinate descent methods HW 2 out
15	Mar 13	Least Angle Regression (LARS)
16	Mar 18	LARS: Lasso modification Estimation of high dimensional low rank matrices
17	Mar 20	Estimation of low rank matrices: Decomposition lemma for error matrix
18	Mar 25	Estimation of low rank matrices: Restricted Strong Convexity (RSC) HW 2 due HW 3 out
	Mar 27	NO CLASS Attend Prof. Andrew Gelman’s talk: “Causality and Statistical Learning” in the Ford School of Public Policy
19	Apr 1	Bound on the maximum singular value of a matrix with iid (multivariate normal) rows Gordon’s Theorem HW 3 due
20	Apr 3	Proof of Gordon’s theorem using Slepian’s inequality Gaussian concentration inequality for Lipschitz functions HW 4 out
21	Apr 8	Fisher’s LDA in high dimensions
22	Apr 10	Naive Bayes or Independence Rule in high dimensions HW 4 due
23	Apr 15	Loss based classification in high dimensions
	Apr 17	Project presentations I Hossein Yuan Can Robert Phoenix
	Apr 22	Project presentations II Kam, Yiwei Sougata Naveen Chia Chye Yee Xuan
	Apr 26	Project reports due

Readings

Books

Peter Bühlmann and Sara Van De Geer. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, 2011.
Tony Cai and Xiaotong Shen. High-dimensional data analysis. World Scientific, 2010.
Yonina C. Eldar and Gitta Kutyniok. Compressed Sensing: Theory and Applications. Cambridge University Press, 2012.

Overviews

Sahand N. Negahban, Pradeep Ravikumar, Martin J. Wainwright, and Bin Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statist. Sci. Volume 27, Number 4 (2012), 538-557.
Wasserman, Larry. "Low Assumptions, High Dimensions." Rationality, Markets and Morals 2.49 (2011).
Bruckstein, Alfred M., David L. Donoho, and Michael Elad. "From sparse solutions of systems of equations to sparse modeling of signals and images." SIAM review 51.1 (2009): 34-81.
Johnstone, I. M. (2007). High dimensional statistical inference and random matrices. In International Congress of Mathematicians I 307–333. Eur. Math. Soc., Zürich.
Donoho, David L. "High-dimensional data analysis: The curses and blessings of dimensionality." AMS Math Challenges Lecture (2000): 1-32.

Regression

Bickel, Peter J., Ya’acov Ritov, and Alexandre B. Tsybakov. "Simultaneous analysis of Lasso and Dantzig selector." The Annals of Statistics 37.4 (2009): 1705-1732.
Van de Geer, Sara A. "High-dimensional generalized linear models and the Lasso." The Annals of Statistics 36.2 (2008): 614-645.
Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society. Series B (Methodological) (1996): 267-288.

Classification

Fan, J, Fan, Y. and Wu, Y. (2011). High-dimensional classification. In High-dimensional Data Analysis. (Cai, T.T. and Shen, X., eds.), 3-37, World Scientific, New Jersey.
Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. "1-norm support vector machines." Advances in neural information processing systems 16.1 (2004): 49-56.
Bradley, Paul S., and Olvi L. Mangasarian. "Feature selection via concave minimization and support vector machines." Machine Learning Proceedings of the Fifteenth International Conference (ICML’98). 1998.

Low Rank Matrix Estimation, Matrix Completion

Davenport, M. A., Plan, Y., Berg, E. V. D., & Wootters, M. (2012). 1-Bit Matrix Completion. arXiv preprint arXiv:1209.3672.
Negahban, S., & Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 98888, 1665-1697.
Negahban, S., & Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics,39(2), 1069-1097.
Candès, Emmanuel J., and Terence Tao. "The power of convex relaxation: Near-optimal matrix completion." Information Theory, IEEE Transactions on 56.5 (2010): 2053-2080.

Covariance Estimation

Rothman, A. J., Bickel, P. J., Levina, E., & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494-515.
Bickel, P. J., & Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1), 199-227.

Sparse PCA

Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11, 517-553.
d'Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM review, 49(3), 434-448.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of computational and graphical statistics, 15(2), 265-286.

Graphical Models

Ravikumar, P., Wainwright, M. J., Raskutti, G., & Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935-980.
Ravikumar, Pradeep, Martin J. Wainwright, and John D. Lafferty. "High-dimensional Ising model selection using ℓ1-regularized logistic regression." The Annals of Statistics 38.3 (2010): 1287-1319.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
Meinshausen, Nicolai, and Peter Bühlmann. "High-dimensional graphs and variable selection with the lasso." The Annals of Statistics 34.3 (2006): 1436-1462.

Clustering

Bouveyron, C., & Brunet-Saumard, C. (2012). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis.
Sun, W., Wang, J., & Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6, 148-167.
Vidal, R. (2011). Subspace clustering. Signal Processing Magazine, IEEE,28(2), 52-68.
Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.

Optimization

F. Bach, R. Jenatton, J. Mairal, G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.
Agarwal, A., Negahban, S. N., & Wainwright, M. J. (2011). Fast global convergence of gradient methods for high-dimensional statistical recovery.arXiv preprint arXiv:1104.4824.