STATS 607A: Programming and Numerical Methods in

Fall 2014

- Days & Time: Mondays & Wednesdays, 4 pm -- 5:30 pm
- Location: B760 East Hall
- Description: This is the first part (Part A) of a two part course. Part A focuses on building good programming skills using the Python language and learning to use them for solving complex data analysis problems. Prior exposure to some programming is recommended. Prior exposure to probability and statistics (at an advanced undergraduate level) is required. We will begin by introducing basics of Python (functions, recursion, objects, exceptions, types, data structures). Towards the end the course we will focus on more advanced topics like exploiting parallelism in the Map-Reduce framework and dealing with relational databases. Part B, offered in the following semester, will focus on numerical methods in linear algebra.
- Textbook: There’s no official textbook. I will list resources for each lecture below.
- Ctools: You should access the Ctools class page for this course frequently. It will contain important announcements and posted homework assignments.
- Course end date: This is a half-semester course and will end on October 22, 2014.

Name: Ambuj Tewari

Office: 454 West Hall

Office Hours: By appointment

Email: tewaria@umich.edu

Name: Kam Chung Wong

Office Hours and Location: Mondays 6 -- 7:30 pm, Fridays 5:30 -- 6:00 pm in SLC (1720 Chemistry)

Email: kamwong@umich.edu

The final grade in the course will be determined by your best 2 scores (on a scale of 0 through 100) out of 3 assignments weighted equally.

- Assignment 1 (Basic Python):

- Out: Sep 15, Due: Sep 29

- Assignment 2 (Numpy, Scipy):

- Out: Oct 2, Due: Oct 17

- Assignment 3 (Matplotlib, Pandas):

- Out: Oct 26, Due: Nov 16

Thanks to a pointer from Prof. Kerby Shedden, I will be using Python notebooks instead of using slides (credit to him if it works, blame on me if it doesn’t!). Notebooks allow me to mix text and code to illustrate various programming tools and techniques. The notebooks are in a github repository:

https://github.com/ambujtewari/stats607a-fall2014/wiki

The notebooks themselves are just static documents (in JSON format) but clicking on the links will show you properly rendered notebooks thanks to the awesome rendering service at http://nbviewer.ipython.org/.

Week 0 (Sep 3)

- Sep 3

- Lecture 00: Introduction
- Reading Assignment: Read An Informal Introduction to Python

Week 1 (Sep 8, 10)

- Sep 8

- Lecture 01: Control Flow and Function Arguments
- Reading Assignment: Read More Control Flow Tools

- Sep 10

- Lecture 02: Data Structures
- Reading Assignment: Read Data Structures

Week 2 (Sep 15, 17)

- Sep 15

- Lecture 03: Standard Library
- Assignment One: See the ctools website
- Reading Assignment: Read Modules (this topic not covered in lecture)
- Reading Assignment: Read Input and Output (this topic not covered in lecture)
- Reading Assignment: Read Errors and Exceptions (this topic not covered in lecture)
- Reading Assignment: Read Brief Tour of the Standard Library

- Sep 17

- Lecture 04: Numpy Basics
- Reading Assignment: Read the Tentative Numpy Tutorial

Week 3 (Sep 22, 24):

- Sep 22

- Lecture 05: More Numpy
- Reading Assignment: Familiarize yourself with Statistics Routines for Numpy Arrays
- Reading Assignment: Familiarize yourself with Random Sampling Routines for Numpy Arrays

- Sep 24

- Lecture 06: Numpy Wrap-up
- Reading Assignment: Familiarize yourself with Input/Output Routines for Numpy Arrays
- Reading Assignment: Familiarize yourself with Linear Algebra Routines for Numpy Arrays

Week 4 (Sep 29, Oct 1):

- Sep 29

- Lecture 07: Scipy
- Reading Assignment: Familiarize yourself with Optimization and Root Finding
- Reading Assignment: Familiarize yourself with Special Functions
- Reading Assignment: Familiarize yourself with Statistical Functions

- Oct 1

- Lecture 08: Matplotlib
- Reading Assignment: Familiarize yourself with the Matplotlib User’s Guide and read the Pyplot Tutorial

(Note: we’re using version 1.3.1, a newer version 1.4 is out but not included in Anaconda yet)

Week 5 (Oct 6): [No class on Oct 8, work on HW 2]

- Oct 6

- Lecture 09: Pandas Basics
- Reading Assignment: Read Intro to Data Structures

Week 6 (Oct 15): [Oct 13 is during Fall Study Break]

- Oct 15

- Lecture 10: Data Importing and Exporting in Pandas
- Reading Assignment: Read IO Tools

Week 7 (Oct 20, 22): Map-Reduce and Hadoop

- Oct 20

- Lecture 11: Word count example
- Resources:

- CAEN Hadoop material
- Fladoop User Guide (Fladoop = Flux Hadoop Stack)
- Fladoop Streaming User Guide

- Oct 22

- Lecture 12: Inverted index example