STATS 700, Fall 2025
Since the release of OpenAI’s o1 and DeepSeek’s R1 models, interest in the reasoning capabilities of LLMs has increased. This half-semester (7-week) course will cover some of the main ingredients that go into enhancing an LLM’s reasoning capability. We will also discuss some recent theory papers that try to understand this fascinating emerging area from a mathematical perspective.
A strong interest in reasoning, LLMs, and a high level of mathematical maturity will be needed to fully benefit from this course. The topics list below is tentative and subject to change.
Logistics
Time & Days: TuTh 2:30PM - 4:00PM
Location: 2060 SKB
Half semester course dates: Aug 25, 2025-Oct 10, 2025
Topics
Background (~ 2 weeks)
J&M = Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin
LLMs
- Transformers, J&M Chapter 9 annotated chapter
- Large Language Models, J&M Chapter 10 annotated chapter
- Model Alignment, Prompting, and In-Context Learning, J&M Chapter 12 annotated chapter
- Since Section 12.7 (Model Alignment with Human Preferences) is missing in Chapter 12 above, we will refer to these notes. ChatGPT generated Latex pdf is here (warning: might have errors!)
- For more on RLHF, you can also refer to the RLHF book being written by Nathan Lambert, currently a post-training lead at the Allen Institute for AI.
Reasoning LLMs
Theory Papers (~ 5 weeks)
- A Theory of Emergent In-Context Learning as Implicit Structure Induction
- Two main results: (1) ICL abilities can arise if next-token pretraining is done on distributions with compositional structure. (2) Prompting an LLM to produce intermediate tokens can improve performance.
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems, ICLR 2024
- With T steps of CoT, constant-depth transformers with constant-bit precision and logarithmic embedding size can solve any problem solvable by Boolean circuits of size T.
- Scaling Test-Time Compute Without Verification or RL is Suboptimal, ICML 2025
- Proves that verifier-based methods using RL/search dominate verifier-free methods based on distillation or cloning search traces, given fixed compute/data budgets.
- Optimizing Test-Time Compute via Meta Reinforcement Finetuning, ICML 2025
- Formalizes optimizing test-time compute as a meta-RL problem, offering guidance on how to optimally allocate inference-time computation.
- On the Power of Context-Enhanced Learning in LLMs, ICML 2025
- Proposes CEL, a variant of supervised fine-tuning where extra context is provided but gradients are not taken through it. In a simplified setting, shows CEL can be exponentially more sample-efficient than vanilla SFT for multi-step reasoning tasks.
- Understanding Chain-of-Thought in LLMs through Information Theory, ICML 2025
- Provides an information-theoretic framework that quantifies the “information gain” at each reasoning step.
- A Theory of Learning with Autoregressive Chain of Thought, COLT 2025
- Proposes a learning-theoretic framework where prompt-to-answer mapping is modeled as repeated application of a time-invariant “single-step” function. Considers both observed and latent CoT settings, showing sample complexity can be independent of CoT length, with attention arising naturally in the framework.
- When More is Less: Understanding Chain-of-Thought Length in LLMs
- Studies optimal CoT lengths, showing that longer is not always better: performance peaks at a sweet spot, then declines due to error accumulation.
- Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning
- Shows that Pass@N is misaligned with cross-entropy training. Proposes confidence-limiting objectives that improve performance on math and reasoning tasks.
- On Learning Verifiers for Chain-of-Thought Reasoning
- Analyzes the PAC-learnability of verifiers for CoT reasoning. Derives sample-complexity upper bounds and impossibility results.
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Argues that CoT gains are distribution-dependent and may vanish out-of-distribution, suggesting that CoT reasoning is brittle and not robustly general.
Interesting Observations Waiting for Theoretical Analysis