9.1 Matrix Games (矩阵对策)

最新推荐文章于 2025-06-04 03:07:24 发布

连理o

最新推荐文章于 2025-06-04 03:07:24 发布

阅读量1.5k

点赞数 2

分类专栏： # 最优化文章标签：线性代数

本文链接：https://blog.csdn.net/weixin_42437114/article/details/109251734

版权

最优化专栏收录该内容

3 篇文章

订阅专栏

本文介绍了矩阵博弈的基本概念，包括两人的零和博弈。玩家根据支付矩阵选择策略，寻找最大化期望收益的策略。当没有鞍点时，可以通过随机策略结合概率向量找到最优解。文章还讨论了如何通过减少游戏规模来简化问题，并提供了具体的例子说明如何找到最优策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文为《Linear algebra and its applications》的读书笔记

Matrix Games

The theory of games(博弈论) analyzes competitive phenomena and seeks to provide a basis for rational decision-making.

The games in this section are matrix games whose various outcomes are listed in a payoff matrix. Two players in a game compete according to a fixed set of rules.

Player $R$ (for row) has a choice of $m$ possible moves (or choices of action), and player $C$ (for column) has $n$ moves. By convention, the payoff matrix $A = [a_{ij} ]$ lists the amounts that the row player $R$ wins from player $C$ , depending on the choices $R$ and $C$ make. Entry $a_{ij}$ shows the amount $R$ wins when $R$ chooses action $i$ and $C$ chooses action $j$ .

The games are often called two-person zero-sum games (双人零和博弈) because the algebraic sum of the amounts gained by $R$ and $C$ is zero.

EXAMPLE 1
Each player has a supply of pennies, nickels, and dimes. At a given signal, both players display (or “play”) one coin. If the displayed coins are not the same, then the player showing the higher-valued coin gets to keep both. If they are both pennies or both nickels, then player $C$ keeps both; but if they are both dimes, then player $R$ keeps them. Construct a payoff matrix, using $p$ for display of a penny, $n$ for a nickel, and $d$ for a dime.
SOLUTION
Each player has three choices, $p$ , $n$ , and $d$ , so the payoff matrix is $3 \times 3$ :

在这里插入图片描述
By looking at the payoff matrix in Example 1, the players discover that some plays are better than others. Both players know that $R$ is likely to choose a row that has positive entries, while $C$ is likely to choose a column that has negative entries. Player $R$ notes that every entry in row 3 is positive and chooses to play a dime. No matter what $C$ may do, the worst that can happen to $R$ is to win a penny. Player $C$ notes that every column contains a positive entry and therefore $C$ cannot be certain of winning anything. So player $C$ chooses to play a penny, which will minimize the potential loss.

From a mathematical point of view, what has each player done? Player $R$ has found the minimum of each row (the worst that could happen for that play) and has chosen the row for which this minimum is largest. (See Fig. 1.) That is, $R$ has computed

在这里插入图片描述
Observe that for $C$ , a large positive payment to $R$ is worse than a small positive payment. Thus $C$ has found the maximum of each column (the worst that can happen to $C$ for that play) and has chosen the column for which this maximum is smallest. Player $C$ has found

在这里插入图片描述
For this payoff matrix $a_{ij} ]$ ,

在这里插入图片描述
As long as both players continue to seek their best advantage, player $R$ will always display a dime (row 3) and player $C$ will always display a penny (column 1).

在这里插入图片描述

In Example 1, the entry $a_{31}$ is a saddle point for the payoff matrix.

EXAMPLE 2
Again suppose that each player has a supply of pennies, nickels, and dimes to play, but this time the payoff matrix is given as follows:

在这里插入图片描述
If player $R$ reasons as in the first example and looks at the row minima, $R$ will choose to play a nickel, thereby maximizing the minimum gain (in this case a loss of 1). Player $C$ , looking at the column maxima, will also select a nickel to minimize the loss to $R$ .

Thus, as the game begins, $R$ and $C$ both continue to play a nickel. After a while, however, $C$ begins to reason, “If $R$ is going to play a nickel, then I’ll play a dime so that I can win a penny.” However, when $C$ starts to play a dime repeatedly, $R$ begins to reason, “If $C$ is going to play a dime, then I’ll play a penny so that I can win a nickel.” Once $R$ has done this, $C$ switches to a nickel (to win a nickel) and then $R$ starts playing a nickel . . . and so on. It seems that neither player can develop a winning strategy.

Mathematically speaking, the payoff matrix for the game in Example 2 does not have a saddle point. Indeed,

在这里插入图片描述
while

在这里插入图片描述
This means that neither player can play the same coin repeatedly and be assured of optimizing the winnings. In fact, any predictable strategy can be countered by the opponent. But is it possible to formulate some combination of plays that over the long run will produce an optimal return?

The answer is $y e s$ (as Theorem 3 later will show), when each move is made at random, but with a certain probability attached to each possible choice.

在这里插入图片描述

概率向量中的每个元素都代表玩家选择某个动作的概率

For instance, if player $R$ randomly choose a play in EXAMPLE 2, then this strategy is specified by the vector in $R^3$ whose entries all equal to $1 / 3$ .

The pure strategies in $R^m$ are the standard basis vectors for $R^m$ , $\boldsymbol e_1, . . . , \boldsymbol e_m$ . In general, each strategy $\boldsymbol x$ is a linear combination of these pure strategies with nonnegative weights that sum to one.

More precisely, each strategy is a convex combination of the set of pure strategies—that is, a point in the convex hull of the set of standard basis vectors. This fact connects the theory of convex sets to the study of matrix games.
The strategy space for $R$ is an ( $m - 1$ )-dimensional simplex in $R^m$ , and the strategy space for $C$ is an ( $n - 1$ )-dimensional simplex in $R^n$ . See Sections 8.5 for definitions.

Suppose now that $R$ and $C$ are playing the $m \times n$ matrix game $A = [a_{ij} ]$ . There are $m n$ possible outcomes of the game. Suppose $R$ uses strategy $\boldsymbol x$ and $C$ uses strategy $\boldsymbol y$ , where

在这里插入图片描述
Since $R$ plays the first row with probability $x_1$ and $C$ plays the first column with probability $y_1$ and since their choices are made independently, it can be shown that the probability is $x_1y_1$ that $R$ chooses the first row and $C$ chooses the first column. Over the course of many games, the expected payoff to $R$ for this outcome is $a_{11}x_1y_1$ for one game. A similar computation holds for each possible pair of choices that $R$ and $C$ can make. The sum of the expected payoffs to $R$ over all possible pairs of choices is called the expected payoff , $E (x, y)$ , of the game to player $R$ for strategies $\boldsymbol x$ and $\boldsymbol y$ . That is,

在这里插入图片描述
Let $X$ denote the strategy space for $R$ and $Y$ the strategy space for $C$ . If $R$ were to choose a particular strategy, say $\tilde \boldsymbol x$ , and if $C$ were to discover this strategy, then $C$ would certainly choose $\boldsymbol y$ to minimize

在这里插入图片描述
The value of using strategy $\tilde \boldsymbol x$ is the number $v(\tilde \boldsymbol x)$ defined by

在这里插入图片描述
Since $\tilde \boldsymbol x^TA$ is a $1 \times n$ matrix, the mapping $\boldsymbol y \mapsto E(\tilde \boldsymbol x , \boldsymbol y) = \tilde \boldsymbol x^TA\boldsymbol y$ is a linear functional on the probability space $Y$ . From this, it can be shown that $E(\tilde \boldsymbol x , \boldsymbol y)$ attains its minimum when $\boldsymbol y$ is one of the pure strategies, $\boldsymbol e_1, . . . , \boldsymbol e_n$ , for $C$ .

A linear functional on $Y$ is a linear transformation from $Y$ into $R$ . The pure strategies are the extreme points of the strategy space for a player. The stated result follows directly from Theorem 16 in Section 8.5.

Recall that $A\boldsymbol e_j$ is the $j$ th column of the matrix $A$ , usually denoted by $\boldsymbol a_j$ . Since the minimum in (1) is attained when $\boldsymbol y = \boldsymbol e_j$ for some $j$ , (1) may be written, with $\boldsymbol x$ in place of $\tilde\boldsymbol x$ , as

在这里插入图片描述

The goal of $R$ is to choose $\boldsymbol x$ to maximize $(\boldsymbol x)$ .

在这里插入图片描述

Of course, $(\boldsymbol x, \boldsymbol y)$ may exceed $v_R$ for some $\boldsymbol x$ and $\boldsymbol y$ if $C$ plays poorly. This value $v_R$ can be thought of as the most that player $R$ can be sure to receive from $C$ , independent of what player $C$ may do.

Equivalently, $\tilde \boldsymbol x$ is optimal if $E(\tilde \boldsymbol x,\boldsymbol y) \geq v_R$ for all $\boldsymbol y$ in $Y$ .

A similar analysis for player $C$ , using the pure strategies for $\boldsymbol x$ , shows that a particular strategy $\boldsymbol y$ will have a value $(\boldsymbol y)$ given by

在这里插入图片描述
That is, the value of strategy $\boldsymbol y$ to $C$ is the maximum of the inner product of $\boldsymbol y$ with each of the rows of $A$ . The number $v_C$ , defined by

在这里插入图片描述

is called the value of the game to $C$ . This is the least that $C$ will have to lose regardless of what $R$ may do. A strategy $\tilde\boldsymbol y$ for $C$ is called optimal if $(\tilde\boldsymbol y) = v_C$ . Equivalently, $\tilde \boldsymbol y$ is optimal if $E(\boldsymbol x,\tilde\boldsymbol y) ≤ v_C$ for all $\boldsymbol x$ in $X$ .

在这里插入图片描述
PROOF

在这里插入图片描述
for all $\boldsymbol x ∈ X$ and for all $\boldsymbol y ∈ Y$ . Thus,

在这里插入图片描述
which proves the theorem.

In Theorem 1, the proof that $v_R ≤ v_C$ was simple. A fundamental result in game theory is that $v_R = v_C$ , but this is not easy to prove. Perhaps the best-known proof depends strongly on certain properties of convex sets and hyperplanes

在这里插入图片描述

在这里插入图片描述

When $(\tilde\boldsymbol x , \tilde\boldsymbol y)$ is a solution to the game, $v_R = v(\tilde\boldsymbol x) ≤ E(\tilde\boldsymbol x , \tilde\boldsymbol y ) ≤ v(\tilde\boldsymbol y) = v_C$ , which shows that $E(\tilde\boldsymbol x,\tilde\boldsymbol y) = v$ .

The next theorem is the main theoretical result of this section. A proof can be based either on the Minimax Theorem or on the theory of linear programming (in Section 9.4).

在这里插入图片描述

PROOF
The proof based on the Minimax Theorem goes as follows: The function $v(\boldsymbol x)$ is continuous on the compact set $X$ , so there exists a point $\tilde\boldsymbol x$ in $X$ such that

在这里插入图片描述
Similarly, there exists $\tilde\boldsymbol y$ in $Y$ such that

在这里插入图片描述
According to the Minimax Theorem, $v_R = v_C = v$ .

$2 \times n$ Matrix Games

When a game matrix $A$ has 2 rows and $n$ columns, an optimal row strategy and $v_R$ are fairly easy to compute. Suppose

在这里插入图片描述
Since $\boldsymbol x$ has only two entries, the probability space $X$ for $R$ may be parameterized by a variable $t$ , with a typical $\boldsymbol x$ in $X$ having the form

在这里插入图片描述
Thus,

在这里插入图片描述
Thus $(\boldsymbol x(t))$ is the minimum value of $n$ linear functions of $t$ . When these functions are graphed on one coordinate system for $0 \leq t \leq 1$ , the graph of $((\boldsymbol x(t))$ as a function of $t$ becomes evident, and the maximum value of $((\boldsymbol x(t))$ is easy to find. The process is illustrated best by an example.

EXAMPLE 4
Consider the game whose payoff matrix is

在这里插入图片描述
a . On a $t - z$ coordinate system, sketch the four lines $z = a_{1j} (1 − t ) + a_{2j} t$ for $0 \leq t \leq 1$ , and darken the line segments that correspond to the graph of $(\boldsymbol x(t))$ .
b. Identify the highest point $M = (t, z)$ on the graph of $v(\boldsymbol x(t))$ . The $z$ -coordinate of $M$ is the value $v_R$ of the game for $R$ , and the $t$ -coordinate determines an optimal strategy $\tilde\boldsymbol x(t)$ for $R$ .
SOLUTION
a. The four lines are

在这里插入图片描述
The heavy polygonal path in Fig. 2 represents $v(\boldsymbol x)$ as a function of $t$ , because the $z$ -coordinate of a point on this path is the minimum of the corresponding $z$ -coordinates of points on the four lines in Fig. 2.

b. The coordinates of $M$ are $(\frac{2}{5},\frac{11}{5})$ . The value of the game for $R$ is $\frac{11}{5}$ . This value is attained at $\frac{2}{5}$ , so the optimal strategy for $R$ is

在这里插入图片描述

For any $2 \times n$ matrix game, Example 4 illustrates the method for finding an optimal solution for player $R$ . Theorem 3 guarantees that there also exists an optimal strategy for player $C$ , and the value of the game is the same for $C$ as for $R$ . With this value available, an analysis of the graphical solution for $R$ , as in Fig. 2, will reveal how to produce an optimal strategy $\tilde\boldsymbol y$ for $C$ . The next theorem supplies the key information about $\tilde\boldsymbol y$ .

在这里插入图片描述
PROOF
Write $\hat\boldsymbol y = \hat\boldsymbol y_1\boldsymbol e_1 + · · · + \hat\boldsymbol y_n\boldsymbol e_n$ in $R^n$ , and note that $E(\hat\boldsymbol x ,\hat\boldsymbol y) = v(\hat\boldsymbol x) ≤ E(\hat\boldsymbol x ,\boldsymbol e_j )$ for $j = 1, . . ., n$ . So there exist nonnegative numbers $ε_j$ such that

在这里插入图片描述
Then

在这里插入图片描述
This equality is possible only if $\hat y_j = 0$ whenever $ε_j > 0$ . Thus $\hat\boldsymbol y$ is a linear combination of the $\boldsymbol e_j$ for which $ε_j = 0$ . For such $j$ , $E(\hat\boldsymbol x ,\boldsymbol e_j ) = v$ .

Next, observe that $E(\boldsymbol e_i,\hat\boldsymbol y) ≤ v(\hat\boldsymbol y) = E(\hat\boldsymbol x , \hat\boldsymbol y)$ for $i = 1, . . ., m$ . So there exist nonnegative numbers $δ_i$ such that

在这里插入图片描述

Then, using (5) gives

在这里插入图片描述

This equality is possible only if $δ_i = 0$ when $\hat x_i\neq0$ . By (7), $E(\boldsymbol e_i, \hat\boldsymbol y) = v$ for each $i$ such that $\hat x_i\neq0$ .

EXAMPLE 5
The value of the game in Example 4 is $\frac{11}{5}$ , attained when $\hat\boldsymbol x =\begin{bmatrix}\frac{3}{5}\\\frac{2}{5}\end{bmatrix}$ Use this fact to find an optimal strategy for the column player $C$ .
SOLUTION
The $z$ -coordinate of the maximum point $M$ in Fig. 2 is the value of the game, and the $t$ -coordinate identifies the optimal strategy $\boldsymbol x(\frac{2}{5}) = \hat\boldsymbol x$ . Recall that the $z$ -coordinates of the lines in Fig. 2 represent $(\boldsymbol x(t), \boldsymbol e_j )$ for $j = 1, . . ., 4$ . Only the lines for columns 1 and 3 pass through the point $M$ , which means that

在这里插入图片描述

By Theorem 4, the optimal column strategy $\hat\boldsymbol y$ for $C$ is a linear combination of the pure strategies $\boldsymbol e_1$ and $\boldsymbol e_3$ in $R^2$ . Thus, $\hat\boldsymbol y$ has the form

在这里插入图片描述

where $c_1 + c_3 = 1$ . Since both coordinates of the optimal $\hat\boldsymbol x$ are nonzero, Theorem 4 shows that $E(\boldsymbol e_1, \hat\boldsymbol y) = \frac{11}{5}$ and $E(\boldsymbol e_2,\hat\boldsymbol y) = \frac{11}{5}$ . Each condition, by itself, determines $\hat\boldsymbol y$ . For example,

在这里插入图片描述

Substitute $c_3 = 1 − c_1$ , and obtain $4c_1 + (1 − c_1) = \frac{11}{5}$ , $c_1 = \frac{2}{5}$ and $c_3 = \frac{3}{5}$ . The optimal strategy for $C$ is $\hat\boldsymbol y =\begin{bmatrix}\frac{2}{5}\\0\\\frac{3}{5}\\0\end{bmatrix}$ .

Reducing the Size of a Game

The general $m \times n$ matrix game can be solved using linear programming techniques, and Section 9.4 describes one method for doing this. In some cases, however, a matrix game can be reduced to a “smaller” game whose matrix has only two rows. If this happens, the graphical method of Examples 4 and 5 is available.

在这里插入图片描述
Suppose that in the matrix game $A$ , row $r$ dominates row $s$ . This means that for $R$ the pure strategy of choosing row $r$ is at least as good as the pure strategy of choosing row $s$ , no matter what $C$ may choose. It follows that the recessive row $s$ (the “smaller” one) can be ignored by $R$ without hurting $R$ ’s expected payoff. A similar analysis applies to the columns of $A$ , in which case the dominating “larger” column is ignored. These observations are summarized in the following theorem.

在这里插入图片描述
EXAMPLE 6
Use the process described in Theorem 5 to reduce the following matrix game to a smaller size. Then find the value of the game and optimal strategies for both players in the original game.
SOLUTION
Since the first column dominates the third, player $C$ will never want to use the first pure strategy. So delete column 1 and obtain

在这里插入图片描述
In this matrix, row 2 is recessive to row 3. Delete row 2 and obtain

在这里插入图片描述
This reduced $2 \times 3$ matrix can be reduced further by dropping the last column, since it dominates column 2. Thus, the original matrix game $A$ has been reduced to

在这里插入图片描述

and any optimal strategy for $B$ will produce an optimal strategy for $A$ , with zeros as entries corresponding to deleted rows or columns.

A quick check of matrix $B$ shows that the game has no saddle point (because 3 is the max of the row minima and 5 is the min of the column maxima). So the graphical solution method is needed.

在这里插入图片描述
Figure 3 shows the lines corresponding to the two columns of $B$ , whose equations are $z = 4 t + 1$ and $z = - 3 t + 6$ . They intersect where $\frac{5}{7}$ ; the value of the game is $\frac{27}{7}$ , and the optimal row strategy for matrix $B$ is

在这里插入图片描述
Since the game has no saddle point, the optimal column strategy must be a linear combination of the two pure strategies. Set $\hat\boldsymbol y = c_1\boldsymbol e_1 + c_2\boldsymbol e_2$ , and use the second part of Theorem 4 to write

在这里插入图片描述
Solving gives $5c_2 = \frac{20}{7} , c_2 = \frac{4}{7}$ , and $c_1 = 1 − c_2 = \frac{3}{7}$ . Thus $\hat\boldsymbol y= \begin{bmatrix}\frac{3}{7}\\\frac{4}{7}\end{bmatrix}$ . As a check, compute $E(\boldsymbol e_2,\hat\boldsymbol y) = 5(\frac{3}{7}) + 3(\frac{4}{7}) = \frac{27}{7} = v$ .

The final step is to construct the solution for matrix $A$ from the solution for matrix $B$ . Look at the matrices in (8) to see where the extra zeros go. The row and column strategies for $A$ are, respectively,

在这里插入图片描述