STOR 767 Advanced Machine Learning, Course Information, Fall 2023

Class information: Monday and Wednesday 1:25pm – 2:40pm in Hanes 130.  In person format.

Instructor:  Andrew B. Nobel

Office: Hanes 308   Email: nobel@email.unc.edu

Nobel Office Hours (via Zoom): Tuesday 4-5pm

Registration: Enrollment and registration for the course is handled online.  Please contact Ms. Christine Keat (crikeat@email.unc.edu) if you have questions.

Auditing the Class: Students wishing to audit the course must get approval from the instructor, and will need for formally register as auditors.  Auditors are expected to complete the weekly homework assignments, but need not do a final project.

There is no teaching assistant for the course

Overview: Machine learning encompasses a wide variety of activities in academia and industry, including or overlapping data mining, the analysis of “big data”, artificial intelligence, and deep learning.  As an academic discipline, machine learning has points of contact with statistics, optimization, computer science, mathematics, and engineering.  For the purposes of this course, we regard machine learning as the study and understanding of statistical methods that identify structure in large data sets, and that use existing data to make predictions about future observations.  In most cases, machine learning approaches are based on general models and procedures that are not tailored to the specific problem at hand.

Audience: This course is targeted to graduate (masters and PhD) students in STOR, Computer Science, Mathematics, and related fields who have had previous exposure to statistics, linear algebra, probability, and real analysis (see prerequisites below).  An undergraduate course in machine learning is desirable, but not necessary.

Goals: The course will familiarize students with a number of key ideas and current trends in the field machine learning, with a focus on methods related to inference, modeling, and prediction.  The lectures will emphasize fundamentals and mathematical rigor rather than methodological recipes.  We will consider a smaller number of representative areas and methods in some detail, with the goal of illustrating core ideas having broad applicability.

Homework assignments will emphasize theoretical material and understanding.

Prerequisites: Students should have a good understanding of theoretical and applied statistics, at the level of STOR 654 and STOR 664.  In particular, students should be familiar with the following material (at the level of advanced undergraduate coursework)

  • Statistics, including loss and risk functions, point estimation and hypothesis testing, linear regression, hierarchical models
  • Linear and matrix algebra, including norms, inner products, eigenvalues and eigenvectors, rank, projections, and non-negative definite matrices
  • Calculus based probability, including conditional distributions and expectations, mutlivariate distributions, moment generating functions, Chernoff bound and Hoeffding’s inequality
  • Advanced calculus, including suprema, infima, limits, continuous functions, multivariate differentiation and integration, open, closed, and compact sets
  • Basic convex analysis in Euclidean space, including convex sets and functions, convex hulls, extreme points, and subgradients

Protocol for lectures

  • Please arrive on-time, before the beginning of class.  If you need to arrive late or leave early, let the instructor know in advance.
  • Please refrain from using laptops, phones, and other non-note-taking devices.  Use of tablets is allowed during lectures only if they are used for taking notes.

Office Hours: If you have questions about the homework assignments or lecture material, please speak with the instructor after class, or during his office hours.

Attendance:  Students should attend all lectures.  If you are unable to attend a lecture, please make plans to get the notes from another student in the class.

Homework Assignments: Homework assignments and due dates will be posted on the course web page.

Homework Policy: Homework assignments will be handled via Gradescope, and should be submitted before class on the day that they are due, so please be prepared to submit your assignments at that time.

For homework assignments, please clearly label each problem, show your work (including your mathematical arguments), and give a clear account of your reasoning in English, using full sentences, when appropriate.

If your answers to a question are based in whole or in part on an online source, that source should be cited.

Project:  There will be a final group project due at the end of the semester.  The project will involve an in-class presentation as well as a written report.  Students will have the option of doing a more theoretically oriented project, in which they read, summarize, and  discuss a technical paper in the machine learning literature, or a more applied project in which they analyze one or more data sets using methods discussed in, or closely related to, those covered in the lectures.

Grading (tentative): Grading will be based on homeworks and the final project

Homework
Final Project

 

Syllabus: The course will begin with a brief overview of linear regression and classification.  The following is a tentative syllabus.

1. Review of Linear Regression

  • OLS, Ridge Regression, and the LASSO
  • Training vs. test error, quantifying optimism of the training error
  • Consistency of the LASSO

Sources:
Elements of Statistical Learning, Hastie, Tibshirani, and Friedman
Assumptionless consistency of the Lasso, C. Chatterjee

 

2. Classification and Empirical Risk Minimization

  • Overview of the classification problem
  • Stochastic setting, optimality of the Bayes rule, Bayes risk
  • Empirical risk minimization
  • Optimism of empirical risk, connections with uniform strong laws
  • Rademacher complexity, VC-dimension, and the VC-inequality
  • Performance of ERM for finite and infinite families of classification rules
  • Lower bounds on estimation error for general classification procedures

Sources:
Foundations of Machine Learning, Mohri, Rostamizadeh, and Talwalkar
Probabilistic Pattern Recognition, Devroye, Gyorfi, and Lugosi

 

3. Multi-armed bandits

  • Sequential allocation, exploration-exploitation tradeoff
  • Stochastic bandits
  • Adversarial bandits

Sources:
Reinforcement Learning, by Sutton and Barto
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, by Bubeck and Cesa-Bianchi

 

4. Prediction of Individual Sequences

  • Setting: prediction, experts, and regret
  • Exponentially weighted forecaster: fixed and variable time horizon
  • Randomized prediction: oblivious and non-oblivious opponents
  • Exponential weighting in the randomized setting
  • Prediction and zero sum games

Source:
Prediction, Learning, and Games, by Cesa-Bianchi and Lugosi

 

5. Conformal Prediction

  • Conformal prediction setting
  • Exchangeability and nonconformity measures
  • Conformal algorithm
  • Validity
  • Examples

Sources
Tutorial on Conformal Prediction, Shafer and Vovk
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification, by Angelopoulos and Bates

 

6. Graphical Models and Variational Inference

  • Graphical models: Bayesian networks and Markov random fields
  • Inference in graphical models: factor graphs, sum-product and max-sum algorithms
  • Review of the EM algorithm
  • Variational Bayes inference
  • Latent Dirichlet allocation

Sources:
Pattern Recognition and Machine Learning, C. Bishop
Variational Inference: A Review for Statisticians, Blei, Kucukelbir, and McAuliffe
Latent Dirichlet Allocation, Blei, Ng, and Jordan

 

Other Sources

Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy. (2012). MIT Press.

 

Disclaimer: The instructor reserves the right to make changes to the syllabus, and to the due dates of assignments. The latter will be announced as early as possible.

 

Study tips

1. Keep up with the reading and homework assignments. If the reading assignment is long, break it up into smaller pieces (perhaps one section or subsection at a time).

2. Always look over the notes from lecture k before attending lecture k+1. This will help keep you on top of the course material. Ideas from one lecture often carry over to the next: you will get much more out of the material if you can maintain a sense of continuity and keep the “big picture” in mind.

3. Complete the reading *before* doing the homework. Trying to find the right formula or paragraph for a particular problem often takes as much time, and it tends to create more confusion than it resolves.

4. When looking over your notes or the reading assignment, keep a pencil and scratch paper on hand, and try to work out the details of any argument or idea that is not completely clear to you.  Even if the argument or idea is clear, it can be helpful to write it down again in a different way in order to test and strengthen your understanding.

5. It is important to know what you know, but it’s especially important to know what you don’t know.  As you look over the reading material and your notes, ask yourself if you (really) understand it.  Keep careful track of any concepts and ideas that are not clear to you, and make efforts to master these in a timely fashion.

6. One good way of seeing if you understand an idea or concept is to write down (or state out loud) the associated definitions and basic facts, without the aid of your notes and in complete, grammatical sentences.  Translating mathematics into English, and back again, is an important research skill, and a good way to build and assess your understanding.

 

Honor Code Policy

As a condition of joining the Carolina community, Carolina students pledge “not to lie, cheat, or steal” and to hold themselves, as members of the Carolina community, to a high standard of academic and non-academic conduct while both on and off Carolina’s campus. This commitment to academic integrity, ethical behavior, personal responsibility and civil discourse exemplifies the “Carolina Way,” and this commitment is codified in both the University’s Honor Code and in other University student conduct-related policies.

Accessibility Resources

The University of North Carolina at Chapel Hill facilitates the implementation of reasonable accommodations, including resources and services, for students with disabilities, chronic medical conditions, a temporary disability or pregnancy complications resulting in barriers to fully accessing University courses, programs and activities.

Accommodations are determined through the Office of Accessibility Resources and Service (ARS) for individuals with documented qualifying disabilities in accordance with applicable state and federal laws. See the ARS Website for contact information: https://ars.unc.edu or email ars@unc.edu.

Counseling and Psychological Resources

CAPS is strongly committed to addressing the mental health needs of a diverse student body through timely access to consultation and connection to clinically appropriate services, whether for short or long-term needs. Go to their website: https://caps.unc.edu/or visit their facilities on the third floor of the Campus Health Services building for a walk-in evaluation to learn more.

Title IX Resources

Any student who is impacted by discrimination, harassment, interpersonal (relationship) violence, sexual violence, sexual exploitation, or stalking is encouraged to seek resources on campus or in the community. Please contact the Director of Title IX Compliance (Adrienne Allison – Adrienne.allison@unc.edu), Report and Response Coordinators in the Equal Opportunity and Compliance Office (reportandresponse@unc.edu), Counseling and Psychological Services (confidential), or the Gender Violence Services Coordinators (gvsc@unc.edu; confidential) to discuss your specific needs. Additional resources are available at safe.unc.edu.

University Attendance Policy

No right or privilege exists that permits a student to be absent from any class meetings, except for these University Approved Absences:

  1. Authorized University activities
  2. Disability/religious observance/pregnancy, as required by law and approved by Accessibility Resources and Service and/or the Equal Opportunity and Compliance Office (EOC)
  3. Significant health condition and/or personal/family emergency as approved by the Office of the Dean of Students, Gender Violence Service Coordinators, and/or the Equal Opportunity and Compliance Office (EOC).