Course Information for STOR 654, Statistical Theory 1

Class meetings, Fall 2020: Tuesday and Thursday 4:45pm – 6:00pm.  Assigned room Hanes Art Center 121

Note: Course will initially be online

Prerequisites:  Advanced undergraduate coursework in calculus based probability, theoretical statistics, real analysis or advanced calculus, and linear algebra.  A more detailed list of prerequisite knowledge is given below.

Registration: Enrollment and registration for the course is handled online.

Instructor:  Andrew B. Nobel

Office: Hanes 308   Email: nobel@email.unc.edu   Phone: 919-962-1352.

Instructor Office Hours: 1:20-2:40pm Mondays

TA: Haodong Wang

Office: Hanes Hall B1   Email: haodong@live.unc.edu.

TA Office Hours: 6-6:45pm on Tuesdays

Goals: The goal of the graduate theoretical statistics sequence (STOR 654 and 655) is to provide students with a mathematically rigorous introduction to the key ideas, results, and techniques of theoretical statistics and statistical inference. STOR 654, the first course in the sequence, is intended to introduce students to the foundations of inference and learning in a non-asymptotic setting.

Target Audience: The target audience for the course is graduate and advanced undergraduate students in STOR, Mathematics, Computer Science, and Economics who wish to obtain a rigorous grounding in, and understanding of, statistical theory and some of the foundational  ideas in machine learning.

Overview: The first part of the course will cover classical finite sample theory.  The second part of the course will be devoted to a selection of material that is aligned with modern research in statistics and machine learning, including some coverage of high-dimensional inference.  A detailed syllabus is given below.  Most topics will be self-contained, results being derived from first principles and the prerequisite material.

Online Lectures: Online lectures will be “live” at the regularly scheduled time of the course. Most days, I will stay online for a little while afterwards to answer questions.

Protocol for online classes

  • Students are expected to attend online lectures . As you would with an in-person class, please join before the beginning of the lecture, and stay until the end.
  • Please keep your cameras on, so that I can see you and gauge how you are following the material. I plan to keep my own camera on. Please dress appropriately, and be respectful in your online interactions with peers.
  • Please keep your computer muted. If you have a question, feel free to unmute your computer and ask.

 

Administration

Text: The primary text for the first part of the class is “Statistical Inference” by G. Casella and R. Berger and the lecture notes.  Casella and Berger also provides a good overview of much of the prerequisite material for the course.  Material for the second part will come from course notes and online sources.

Homework: Homework for the class will be handled using Gradescope.  Homework problems will be assigned, and due, weekly throughout the semester. Each homework assignment will be graded: late/missed homeworks will receive a grade of zero. Students are welcome (and encouraged) to discuss the homework problems with other members of the class, but they should prepare their final answers on their own.  If you have any questions concerning the grading of homework, please speak first with the TA.

Attendance: Students are expected to attend all lectures.  If you are unable to attend a lecture, please let the instructor know and make plans to get the notes from another student in the class.

Grading (tentative): Grading will be based on homeworks, an in-class midterm, and an in-class final exam, using the weights below.

       Homework 15%
       Midterm 35%
       Final 50%

Note:  The midterm will be given on October 1, during class.  The final exam will be given at the time and date specified by the UNC Final Exam Schedule.

Other sources:

“All of Statistics: A Concise Course in Statistical Inference” by L. Wasserman.  Nice coverage of material on decision theory, Bayesian inference, and hypothesis testing.

“Statistical Decision Theory and Bayesian Analysis” by J.O. Berger.  Classic source for information about, and discussion of, Bayesian inference.

Lecture notes of Emmanuel Candes for Statistics 300C at Stanford.  Link.  Modern coverage of multiple testing, results on FDR control, testing for Gaussian mean model.

“A Course in Large Sample Theory” by T.S. Ferguson.  An intermediate level treatment of asymptotic statistics.

“Asymptotic Statistics” by A. van der Vaart.  A more advanced, research-level treatment of asymptotic statistics.

“High Dimensional Probability: An Introduction with Applications in Data Science” by R. Vershynin.  Good coverage of advanced topics, including exponential inequalities.   Link

“Gaussian estimation: Sequence and wavelet models” by I. Johnstone.  Latest version available online here.

Honor Code: All students should be familiar with and abide by the UNC Honor Code, which covers issues such as plagiarism, unauthorized assistance, and cheating.  Violations of the honor code will be prosecuted.

 

Prerequisites

1. Calculus based probability

Properties of unconditional and conditional probability, Bayes formula, independence.

Random variables, expectations, and variances.  Cumulative distribution functions (CDF), probability density functions, and probability mass functions.

Multiple random variables, joint probability mass/density functions and their properties.  Covariance and correlation.

Conditional expectations, and basic properties.

Finding the distribution of a function of a random variable: the CDF method and the general change of variables theorem.  Convolutions.

2. Basic theoretical statistics

Some familiarity with basic inference, point estimation, confidence intervals, and hypothesis testing

Familiarity with the definition and basic properties of key discrete univariate distributions: Bernoulli, Poisson, binomial, hypergeometric, exponential, and negative binomial.

Familiarity with the definition and basic properties of key continuous univariate distributions: normal, gamma, beta, uniform, exponential, double exponential, and chi-squared.

3. Linear Algebra

Inner and outer products.  Orthogonality.  Real vector spaces, dimension. Definition and basic properties of the Euclidean norm.

Matrix addition and multiplication, rank, transpose, trace, determinant.   Projections onto subspaces.

Eigenvalues and eigenvectors.  Symmetric matrices, non-negative definite matrices, the spectral theorem. Courant-Fischer theorem.

4. Real Analysis and Differential Calculus

Definition and basic properties of suprema, infima, limsup, liminf, and limits.  Open and closed sets.  Compact sets.  Epsilon-delta arguments.

Definition and basic properties of continuous and uniformly continuous functions.  Multiple integrals and Fubini’s theorem.

Gamma function and Stirling’s approximation; Taylor’s theorem in one dimension. Gradients and Hessians of multivariate functions, total derivaties.  How to obtain numerical inequalities using calculus.

 

Syllabus

1. Preliminaries (Some material covered in class)

Elementary properties of suprema and infima; argmax and argmin.

Definition and elementary properties of convex sets and functions; Jensen’s inequality; Holder and Cauchy-Schwartz inequalities.

Order statistics; probability inverse transformation; Stein’s lemma; conditional expectations.

2. Traditional Finite Sample Inference

Families of distributions: location, scale, location-scale, and canonical exponential families.

Decision theory and overview of statistical inference.  Bayes risk and minimax risk, Bayes rules and minimax rules, admissibility.

Data reduction. Sufficiency, factorization theorem, minimal sufficiency. Rao-Blackwell theorem.

Point estimation.  Bias-variance decomposition for squared loss; method of moments; maximum likelihood estimation; Bayesian point estimates; minimum variance estimators, complete and ancillary statistics, Lehmann-Scheffe theorem.

Hypothesis testing.  Frequentist setting, likelihood ratio tests, power function of a test, Type 1 and Type 2 errors, level and size.  UMP tests and Neyman-Pearson Lemma.  P-values.  Bayesian setting, odds ratios and Bayes factors.

Interval estimation. Coverage probability, inverting hypothesis tests.

3. Other Topics (* indicates results presented without proof)

Probability inequalities. Gaussian tail bound. Markov and Chebyshev inequalities. MGFs and Chernoff bounds. Hoeffding’s MGF and probability inequalities. Bennett and Bernstein inequalities.  McDiarmid’s bounded difference inequality*. Gaussian concentration inequality*.  Association inequality.

Subadditivity.  Definition and convergence of subadditive sequences, applications.  MGF based bound on expected value of maximum.

Random vectors.  Definition, expectation, variance and covariance matrices.

Multivariate normal distribution.  General definition, Cramer-Wold device*, representation theorem, non-singular case, independence of jointly multinormal random vectors.

Extreme value theorem for the standard normal distribution.

Global testing.  Fisher combination and Bonferroni global tests.  Optimality of Bonferroni procedure under sparse alternatives in the Gaussian sequence model.  Overview of chi-squared global test.

Multiple testing.  Family-wise error rate.  Strong and weak control of FWER. Bonferroni and Holm step-up procedures.  False discovery rate.  Benjamini-Hochberg procedure for controlling FDR

Distances and divergences for probability distributions. Kolmogorov-Smirnov, total variation, Hellinger, and Kullback-Liebler divergence.  Basic inequalities and tensorization.

Gaussian sequence model.  Maximum likelihood estimator, general bias-variance decomposition, Stein’s unbiased risk estimator, shrinkage estimators, James Stein estimator and associated risk bounds.

 

 

Study tips: 

1. When looking over your notes or the reading assignment, keep a pencil and scratch paper on hand, and try to work out the details of any argument that is not completely clear to you.  Even if you think the argument is clear, it can be helpful to write it down again in order to test and strengthen your understanding.

2. Always look over the notes from lecture k before attending lecture k+1.  You will get much more out of the material if you can maintain a sense of continuity and keep the “big picture” in mind.  This includes mathematical ideas that can make multiple appearances in slightly different forms.

3. It is important to know what you know, but it’s especially important to know what you don’t know.  As you look over the reading material and your notes, ask yourself if you (really) understand it.  Keep careful track of any concepts and ideas that are not clear to you, and make efforts to master these in a timely fashion.  One good way of seeing if you understand an idea or concept is to write down (or state out loud) the associated definitions and basic facts, without the aid of your notes, in full, grammatical sentences.  Translating ideas from mathematics to complete English sentences, and back again, is an important research skill, and a good way to assess your understanding.

 

UNC Statement: Community Standards in Our Course and Mask Use. This fall semester, while we are in the midst of a global pandemic, all enrolled students are required to wear a mask covering your mouth and nose at all times in our classroom. This requirement is to protect our educational community – your classmates and me – as we learn together. If you choose not to wear a mask, or wear it improperly, I will ask you to leave immediately, and I will submit a report to the Office of Student Conduct. At that point you will be disenrolled from this course for the protection of our educational community. Students who have an authorized accommodation from Accessibility Resources and Service have an exception. For additional information, see https://carolinatogether.unc.edu/university-guidelines-for-facemasks/.

UNC Statement: Title IX Resources. Any student who is impacted by discrimination, harassment, interpersonal (relationship) violence, sexual violence, sexual exploitation, or stalking is encouraged to seek resources on campus or in the community. Please contact the Director of Title IX Compliance (Adrienne Allison – Adrienne.allison@unc.edu), Report and Response Coordinators in the Equal Opportunity and Compliance Office (reportandresponse@unc.edu), Counseling and Psychological Services (confidential), or the Gender Violence Services Coordinators (gvsc@unc.edu; confidential) to discuss your specific needs. Additional resources are available at safe.unc.edu.