STATS 605: Advanced Topics in Modeling and Data Analysis
High Dimensional Statistics
- Days & Time: Mondays and Wednesdays, 4-5:30pm
- Location: 130 Dennison
- Description: This advanced level graduate course will focus on computational and statistical issues that arise when dealing with high dimensional problems (roughly speaking, problems where the number of parameters to be estimated exceeds the number of samples available) in regression, classification, matrix completion, covariance estimation, clustering, and graphical model estimation. Such problems arise in diverse application areas such as genomics, finance, neuroscience, tomography, climatology, and recommender systems. We will also explore connections to concentration inequalities, random matrix theory, and convex geometry.
- Textbook: No textbook is required for this course.
- Ctools: You should access the Ctools class page for this course frequently. It will contain important announcements, posted homework assignments, archive of emails to the class, etc.
Name: Ambuj Tewari
Office: 454 West Hall
Office Hours: By appointment
The final grade in the course will be determined by your scores in 4 homeworks and one final project using the weights given below.
- Homeworks (60%): There will be 4 equally weighted homework assignments. The problems on the homeworks will be a mix of theoretical and programming problems. There is no “official” programming language for the course. Knowledge of either Matlab or R should be sufficient to solve problems.
- Final Project (40%): An important component of the course is a final project which can either be:
- a survey of some actively developing sub-topic within high dimensional statistics, or
- a research project involving contributing novel research (theoretical result, statistical method, computational algorithm, or interesting application) to the area of high dimensional statistics
- Project credit breakup: Surveys have to be written individually. However, teams of up to 2 students can be formed for a research project. To get full credit, surveys have to be very high quality: they should be similar to a publishable survey article in a top journal. The bar for research projects will be lower because of the time constraint and the inherent uncertainty in the research process. While you’re not required to deliver publication quality research work by the end of the semester, you’re encouraged to do so. I will provide some suggestions for research projects but you should feel free to work on any problem that interests you. The 40% credit for the project will be split as follows:
- Project proposal (5%)
- Project presentation in class (10%)
- Project final report (25%)
A tentative list of topics is:
- Regression: Lasso, group lasso, theoretical guarantees, optimization algorithms, compressed sensing, phase transitions
- Classification: Performance in high dimensions of one or more of the following methods: Fisher linear discriminant, Bayes classifiers, distance-based classifiers, regularized loss minimization based classifiers
- Matrix completion: Trace norm regularization, Non-commutative versions of concentration inequalities (Hoeffding’s, Bernstein’s)
- Clustering: k-means, subspace clustering
- Large covariance matrices: PCA, covariance estimation, precision matrix estimation
- High dimensional graphical models: Ising models and Gaussian graphical models
- NO CLASS
- Course logistics
- Linear model with n << p, sparsity
- l-1 regularized logistic regression
- GLMs with l-1 regularization
- l-1 SVMs, margins
- Matrix completion and trace norm
- NO CLASS (MLK Jr. Day)
- 1-bit matrix completion
- Non-negative matrix completion
- Covariance and concentration/precision matrix estimation
- Gaussian graphical model selection using l_1 regularized log det
- Gaussian graphical model selection using parallel l_1 regularized regressions
- High dimensional Ising model selection
- Sparse PCA: ScoTLASS, SDP relaxation of l_0 constrained PCA
- Sparse PCA: Generalized power method
- High dimensional k-means
- Sparse subspace clustering
- HW 1 out
- Lasso l_2 error bounds for the linear model case
- Restricted eigenvalue (RE) condition
- Initial project proposals due (deadline extended till Feb 15)
- Sup (i.e. l_\infty) norm error bounds and sign consistency of lasso
- Mutual incoherence condition
- Sup norm error bound continued
- When does the RE condition hold with high probability?
- Proximal methods
- Examples of prox operators
- HW 1 due
- Convergence rates for proximal methods
- Final project proposals due
- NO CLASS (Spring break)
- NO CLASS (Spring break)
- Coordinate descent methods
- HW 2 out
- Least Angle Regression (LARS)
- LARS: Lasso modification
- Estimation of high dimensional low rank matrices
- Estimation of low rank matrices: Decomposition lemma for error matrix
- Estimation of low rank matrices: Restricted Strong Convexity (RSC)
- HW 2 due
- HW 3 out
- NO CLASS
- Attend Prof. Andrew Gelman’s talk: “Causality and Statistical Learning” in the Ford School of Public Policy
- Bound on the maximum singular value of a matrix with iid (multivariate normal) rows
- Gordon’s Theorem
- HW 3 due
- Proof of Gordon’s theorem using Slepian’s inequality
- Gaussian concentration inequality for Lipschitz functions
- HW 4 out
- Fisher’s LDA in high dimensions
- Naive Bayes or Independence Rule in high dimensions
- HW 4 due
- Loss based classification in high dimensions
- Project presentations I
- Project presentations II
- Kam, Yiwei
- Chia Chye Yee
- Project reports due
- Peter Bühlmann and Sara Van De Geer. Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, 2011.
- Tony Cai and Xiaotong Shen. High-dimensional data analysis. World Scientific, 2010.
- Yonina C. Eldar and Gitta Kutyniok. Compressed Sensing: Theory and Applications. Cambridge University Press, 2012.
- Sahand N. Negahban, Pradeep Ravikumar, Martin J. Wainwright, and Bin Yu. A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers. Statist. Sci. Volume 27, Number 4 (2012), 538-557.
- Wasserman, Larry. "Low Assumptions, High Dimensions." Rationality, Markets and Morals 2.49 (2011).
- Bruckstein, Alfred M., David L. Donoho, and Michael Elad. "From sparse solutions of systems of equations to sparse modeling of signals and images." SIAM review 51.1 (2009): 34-81.
- Johnstone, I. M. (2007). High dimensional statistical inference and random matrices. In International Congress of Mathematicians I 307–333. Eur. Math. Soc., Zürich.
- Donoho, David L. "High-dimensional data analysis: The curses and blessings of dimensionality." AMS Math Challenges Lecture (2000): 1-32.
- Bickel, Peter J., Ya’acov Ritov, and Alexandre B. Tsybakov. "Simultaneous analysis of Lasso and Dantzig selector." The Annals of Statistics 37.4 (2009): 1705-1732.
- Van de Geer, Sara A. "High-dimensional generalized linear models and the Lasso." The Annals of Statistics 36.2 (2008): 614-645.
- Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society. Series B (Methodological) (1996): 267-288.
- Fan, J, Fan, Y. and Wu, Y. (2011). High-dimensional classification. In High-dimensional Data Analysis. (Cai, T.T. and Shen, X., eds.), 3-37, World Scientific, New Jersey.
- Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. "1-norm support vector machines." Advances in neural information processing systems 16.1 (2004): 49-56.
- Bradley, Paul S., and Olvi L. Mangasarian. "Feature selection via concave minimization and support vector machines." Machine Learning Proceedings of the Fifteenth International Conference (ICML’98). 1998.
Low Rank Matrix Estimation, Matrix Completion
- Davenport, M. A., Plan, Y., Berg, E. V. D., & Wootters, M. (2012). 1-Bit Matrix Completion. arXiv preprint arXiv:1209.3672.
- Negahban, S., & Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 98888, 1665-1697.
- Negahban, S., & Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics,39(2), 1069-1097.
- Candès, Emmanuel J., and Terence Tao. "The power of convex relaxation: Near-optimal matrix completion." Information Theory, IEEE Transactions on 56.5 (2010): 2053-2080.
- Rothman, A. J., Bickel, P. J., Levina, E., & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494-515.
- Bickel, P. J., & Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1), 199-227.
- Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11, 517-553.
- d'Aspremont, A., El Ghaoui, L., Jordan, M. I., & Lanckriet, G. R. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM review, 49(3), 434-448.
- Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal of computational and graphical statistics, 15(2), 265-286.
- Ravikumar, P., Wainwright, M. J., Raskutti, G., & Yu, B. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935-980.
- Ravikumar, Pradeep, Martin J. Wainwright, and John D. Lafferty. "High-dimensional Ising model selection using ℓ1-regularized logistic regression." The Annals of Statistics 38.3 (2010): 1287-1319.
- Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
- Meinshausen, Nicolai, and Peter Bühlmann. "High-dimensional graphs and variable selection with the lasso." The Annals of Statistics 34.3 (2006): 1436-1462.
- Bouveyron, C., & Brunet-Saumard, C. (2012). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis.
- Sun, W., Wang, J., & Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6, 148-167.
- Vidal, R. (2011). Subspace clustering. Signal Processing Magazine, IEEE,28(2), 52-68.
- Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713-726.
- F. Bach, R. Jenatton, J. Mairal, G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.
- Agarwal, A., Negahban, S. N., & Wainwright, M. J. (2011). Fast global convergence of gradient methods for high-dimensional statistical recovery.arXiv preprint arXiv:1104.4824.