The subject of this course is sequential decision making under uncertainty in a system whose evolution is influenced by decisions. The decision made at any given time depends on the state of the system and the objective is to select a decision making rule that optimizes a certain performance criterion. Such problems can be solved, in principle, using the classical methods of dynamic programming. In practice, however, the applicability of dynamic programming to many important problems is limited by the enormous size of the underlying state spaces. "Neuro-dynamic programming" or "Reinforcement Learning" which is the term used in the Artificial Intelligence literature, uses neural networks and other approximation architectures to overcome such bottlenecks to the applicability of dynamic programming. The methodology allows systems to learn about their behavior through simulation, and to improve their performance through iterative reinforcement. The focus of this course is to understand the mathematical foundations of this methodology in light of the convergence and degree of suboptimality of different algorithms.
|Song Chong||TBA||IT Center (N1)-email@example.com|
|Jeongmin Bae||TBA||IT Center (N1)-firstname.lastname@example.org|
|Sewoong Lee||TBA||IT Center (N1)-email@example.com|