Decision Making in Large Scale Systems

Lecture Notes

LEC #

TOPICS

LECTURE NOTES

1

Markov Decision Processes

Finite-Horizon Problems: Backwards Induction

Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation

(PDF)

2

Value Iteration

Existence and Uniqueness of Bellman's Equation Solution

Gauss-Seidel Value Iteration

(PDF)

3

Optimality of Policies derived from the Cost-to-Go Function

Policy Iteration

Asynchronous Policy Iteration

(PDF)

4

Average-Cost Problems

Relationship with Discounted-Cost Problems

Bellman's Equation

Blackwell Optimality

(PDF)

5

Average-Cost Problems

Computational Methods

(PDF)

6

Application of Value Iteration to Optimization of Multiclass Queueing Networks

Introduction to Simulation-based Methods Real-Time Value Iteration

(PDF)

7

Q-Learning

Stochastic Approximations

(PDF)

8

Stochastic Approximations: Lyapunov Function Analysis

The ODE Method

Convergence of Q-Learning

(PDF)

9

Exploration versus Exploitation: The Complexity of Reinforcement Learning

(PDF)

10

Introduction to Value Function Approximation

Curse of Dimensionality

Approximation Architectures

(PDF)

11

Model Selection and Complexity

(PDF)

12

Introduction to Value Function Approximation Algorithms

Performance Bounds

(PDF)

13

Temporal-Difference Learning with Value Function Approximation

(PDF)

14

Temporal-Difference Learning with Value Function Approximation (cont.)

(PDF)

15

Temporal-Difference Learning with Value Function Approximation (cont.)

Optimal Stopping Problems

General Control Problems

(PDF)

16

Approximate Linear Programming

(PDF)

17

Approximate Linear Programming (cont.)

(PDF)

18

Efficient Solutions for Approximate Linear Programming

(PDF)

19

Efficient Solutions for Approximate Linear Programming: Factored MDPs

(PDF)

20

Policy Search Methods

(PDF)

21

Policy Search Methods (cont.)

(PDF)

22

Policy Search Methods for POMDPs

Application: Call Admission Control

Actor-Critic Methods

23

Approximate POMDP Compression

24

Policy Search Methods: PEGASUS

Application: Helicopter Control

Assignments

Problem Set 1 (PDF)

Problem Set 2 (PDF)

Problem Set 3 (PDF)

Problem Set 4 (PDF)

Problem Set 5 (PDF)

Projects

The final project consists of a 10-15 page project report and 15-20 minute presentation. Students have option of working on theory, algorithms and / or applications. Project proposals are submitted midway through the term, with the final project due at the end of the term.

Some representative projects are presented in the table below, courtesy of the student author(s).

TOPICS

STUDENTS

PROJECTS

Approximate Dynamic Programming (Via Linear Programming) for Stochastic Scheduling

Mohamed Mostagir

Nelson Uhan

Paper (PDF) (Courtesy of Mohamed Mostagir and Nelson Uhan. Used with permission.)

Slides (PDF) (Courtesy of Mohamed Mostagir and Nelson Uhan. Used with permission.)

How to choose the State Relevance Weight in the Approximate Linear Programming Approach for Dynamic Programming?

Yann Le Tallec

Theophane Weber

Paper (PDF) (Courtesy of Yann Le-Tallec and Theophane Weber. Used with permission.)

Slides (PDF) (Courtesy of Yann Le-Tallec and Theophane Weber. Used with permission.)

Decentralized Strategies for the Assignment Problem

Hariharan Lakshmanan

Slides (PDF) (Courtesy of Hariharan Lakshamanan. Used with permission.)