Students who pass the course satisfactorily will be able to
- identify the problem structures that satisfy the principle of optimality (identify stage and state of the problem),
- decompose a problem into a sequence of manageable (smaller) subproblems,
- construct (recursive) backward and forward Dynamic Programming (DP) models,
- solve a given problem by finding optimal solutions of a sequence of subproblems,
- construct deterministic and stochastic DP models,
- identify network, allocation, gambling, stock-option and inventory models that can be solved using DP formulations,
- identify the optimal policy structures by investigating DP formulations analytically,
- investigate monotonicity of the optimal policy,
- identify the trade-off between short-term and long-term yields,
- use Bayes’ law to incorporate learning into DP models,
- use stochastic ordering of random variables to determine optimal threshold levels,
- formulate and solve Bandit problems for various real-life applications,
- develop Markov Decision Process (MDP) models under total, discounted and average payoff criteria,
- solve MDPs using Linear Programming, Policy Iteration and Value Iteration Algorithms,
- identify deterministic, randomized, Markovian, stationary and nonstationary policies.