【DP】The Dynamic Programming Algorithm

最新推荐文章于 2021-03-26 13:23:52 发布

Quant0xff

最新推荐文章于 2021-03-26 13:23:52 发布

阅读量168

点赞数

分类专栏： # Dynamic Prog. & Opt. 文章标签： DP

本文链接：https://blog.csdn.net/qq_18822147/article/details/112150873

版权

Dynamic Prog. & Opt. 专栏收录该内容

13 篇文章

订阅专栏

Navigator

Basic Model
- Open-loop & Closed-loop
- State transition
Reference

Basic Model

There are two principal features:

an underlying discrete time dynamic system
a cost function that is additive over time

The system has the form
$x_{k+1}=f_k(x_k, u_k, w_k)$
where

$x_k$ is the state of the system and summarizes past information that is relevant for future optimization.
$u_k$ is the control or decision variable to be selected at time $k$ .
$w_k$ is a random parameter (disturbance or noise depending on the context).
$N$ is the horizon or number of times control is applied.
$f_k$ is a function that describes the system and in particular the mechanism by which the state is updated.

Cost function is additive denoted by $g_k(x_k, u_k, w_k)$ , accumulates over time
$g_N(x_N)+\sum_{k=0}^{N-1}g_k(x_k, u_k, w_k)$
where $g_N(x_N)$ is a terminal cost incurred at the end of the process. Due to $w_k$ is a random term, we therefore formulate the problem as an optimization of the expected cost
$\mathbb{E}\bigg\{ g_N(x_N)+\sum_{k=0}^{N-1} g_k(x_k, u_k, w_k)\bigg\}$
where the expectation is with respect to the joint distribution of the random variables involved. Each control $u_k$ is selected with some knowledge of the current state $x_k$ .

Open-loop & Closed-loop

In Open-loop minimization we select all orders $u_0, u_1, \dots, u_{N-1}$ at once at time 0, without waiting to see the subsequent demand levels.

In Closed-loop minimization we postpone placing the order $u_k$ util the last possible moment (time $k$ ) when the current stock $x_k$ will be known.

In particular, in closed-loop inventory optimization we are not interested in finding optimal numerical values of the orders but rather we want to find an optimal rule for selecting at each period $k$ an order $u_k$ for each possible value of stock $x_k$ that can conceivably occur.

State transition

$p_{ij}(u, k)$ is the probability at time $k$ that the next state will be $j$ , given that the current state is $i$ , and the control selected is $u$ , i.e.
$p_{ij}(u, k)=\mathbb{P}(x_{k+1}=j\mid x_k=i, u_k=u)$

We consider the class of policies (control laws) that consist of a sequence of functions
$\pi=\{\mu_0, \dots, \mu_{N-1}\}$
where $\mu_k$ maps states $x_k$ into controls $u_k=\mu_k(x_k)$ and is such that $\mu_k(x_k)\in U_k(x_k)$ for all $x_k\in S_k$ . Such policies will be called admissible.

Given an initial state $x_0$ and an admissible policy $\pi=\{\mu_0, \dots, \mu_{N-1}\}$ , the states $x_k$ and disturbances $w_k$ are random variables with disturbations defined through the system equation
$x_{k+1}=f_k(x_k, \mu_k(x_k), w_k)$
Thus, for given functions $g_k, k=0,1,\dots, N$ , the expected cost of $\pi$ starting at $x_0$ is
$J_\pi(x_0)=\mathbb{E} \bigg\{ g_N(x_N)+\sum_{k=0}^{N-1}g_k(x_k, \mu_k(x_k), w_k) \bigg\}$
An optimal policy $\pi^*$ is one that minimize the cost
$J_{\pi^*}=\min_{\pi\in\Pi} J_\pi(x_0)$
An interesting aspect of the basic problem and of dynamic programming is that it is typically possible to find a policy $\pi^*$ that is simultaneously optional for all initial states.