The subject of this course is sequential decision making under uncertainty in a system whose evolution is influenced by decisions. The decision made at any given time depends on the state of the system and the objective is to select a decision making rule that optimizes a certain performance criterion. Such problems can be solved, in principle, using the classical methods of dynamic programming. In practice, however, the applicability of dynamic programming to many important problems is limited by the enormous size of the underlying state/action spaces as well as uncertainties in the system. ˇ°Neuro-dynamic programmingˇ± or "Reinforcement Learning" which is the term used in the Artificial Intelligence literature, uses neural networks and other approximation architectures to overcome such bottlenecks to the applicability of dynamic programming, while using Mote Carlo estimation and/or stochastic approximation to learn models or value functions of the system. The methodology allows systems to learn about their behavior through simulation, and to improve their performance through iterative reinforcement. The focus of this course is to understand the mathematical foundations of this methodology in light of the convergence, degree of suboptimality, computational complexity and sample efficiency of different algorithms.
|Song Chong||TBA||IT Center (N1)-firstname.lastname@example.org|
|Yongsik Lee||TBA||IT Center (N1)-email@example.com|
|Keunhyung Chung||TBA||IT Center (N1)-firstname.lastname@example.org|
|Jinyeong Lee||TBA||IT Center (N1)-email@example.com|
|Hyunwoo Jung||TBA||IT Center (N1)-firstname.lastname@example.org|