Deep Learning Minimax optimization


Deep Learning / Spring 2024
Homework 3
Please upload your assignments on or before May 6, 2024.
• You are encouraged to discuss ideas with each other. But you must acknowledge your collaborator, and you must compose your own writeup and code
independently.
• We require answers to theory questions to be written in LaTeX. (Figures can be
hand-drawn, but any text or equations must be typeset.) Handwritten homework
submissions will not be graded.
• We require answers to coding questions in the form of a Jupyter notebook. It is
important to include brief, coherent explanations of both your code and your
results to show us your understanding. Use the text block feature of Jupyter
notebooks to include explanations.
• Upload both your theory and coding answers in the form of a single PDF on
Gradescope.
1. (5 points) Understanding policy gradients. In class we derived a general form
of policy gradients. Let us consider a special case here which does not involve
any neural networks. Suppose the step size is η. We consider the so-called bandit
setting where past actions and states do not matter, and different actions ai give
rise to different rewards Ri
.
a. Define the mapping π such that π(ai) = softmax(θi) for i = 1, . . . , k,
where k is the total number of actions and θi
is a scalar parameter encoding
the value of each action. Show that if action ai
is sampled, then the change
in the parameters is given by:
∆θi = ηRi(1 − π(ai)).
b. If constant step sizes are used, intuitively explain why the above update
rule might写Deep Learning  Minimax optimization lead to unstable training. How would you fix this issue to ensure
convergence of the parameters?
2. (5 points) Minimax optimization. In this problem we will see how training GANs
is somewhat fundamentally different from regular training. Consider a simple
problem where we are trying to minimax a function of two scalars:
You can try graphing this function in Python if you like (no need to include it in
your answer.
1
a. Determine the saddle point of this function. A saddle point is a point
(x, y) for which f attains a local minimum along one direction and a local
maximum in an orthogonal direction.
b. Write down the gradient descent/ascent equations for solving this problem
starting at some arbitrary initialization (x0, y0).
c. Determine the range of allowable step sizes to ensure that gradient descent/ascent converges.
d. (2 points). What if you just did regular gradient descent over both variables
instead? Comment on the dynamics of the updates and whether there are
special cases where one might converge to the saddle point anyway.
3. (5 points) Generative models. In this problem, the goal is to train and visualize
the outputs of a simple Deep Convolutional GAN (DCGAN) to generate realisticlooking (but synthetic) images of clothing items.
a. Use the FashionMNIST training dataset (which we used in previous assignments) to train the DCGAN. Images are grayscale and size 28 × 28.
b. Use the following discriminator architecture (kernel size = 5 × 5 with stride
= 2 in both directions):
• 2D convolutions (1 × 28 × 28 → 64 × 14 × 14 → 128 × 7 × 7)
• each convolutional layer is equipped with a Leaky ReLU with slope
0.3, followed by Dropout with parameter 0.3.
• a dense layer that takes the flattened output of the last convolution and
maps it to a scalar.
Here is a link that discusses how to appropriately choose padding and stride values
in order to desired sizes.
c. Use the following generator architecture (which is essentially the reverse of
a standard discriminative architecture). You can use the same kernel size.
Construct:
• a dense layer that takes a unit Gaussian noise vector of length 100 and
maps it to a vector of size 7 ∗ 7 ∗ 256. No bias terms.
• several transpose 2D convolutions (256 × 7 × 7 → 128 × 7 × 7 →
64 × 14 × 14 → 1 × 28 × 28). No bias terms.
• each convolutional layer (except the last one) is equipped with Batch
Normalization (batch norm), followed by Leaky ReLU with slope 0.3.
The last (output) layer is equipped with tanh activation (no batch norm).
d. Use the binary cross-entropy loss for training both the generator and the
discriminator. Use the Adam optimizer with learning rate 10−4
.
e. Train it for 50 epochs. You can use minibatch sizes of 16, 32, or 64. Training
may take several minutes (or even up to an hour), so be patient! Display
intermediate images generated after T = 10, T = 30, and T = 50 epochs.
2
If the random seeds are fixed throughout then you should get results of the
WX:codinghelp

  • 14
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Minimax is a popular algorithm used in game theory and artificial intelligence to determine the optimal move for a player in a game with perfect information. It is often used in games such as chess, tic-tac-toe, and Connect Four. In C#, you can implement the Minimax algorithm by representing the game state and creating a recursive function to search through all possible moves and evaluate their outcomes. Here's a simplified example of Minimax in C#: ```csharp public int MiniMax(int[] board, int depth, bool isMaximizingPlayer) { // Base case: check if the game is over or the maximum depth is reached if (IsGameOver(board) || depth == 0) { return Evaluate(board); } if (isMaximizingPlayer) { int bestScore = int.MinValue; foreach (int move in GetPossibleMoves(board)) { int[] newBoard = MakeMove(board, move); int score = MiniMax(newBoard, depth - 1, false); bestScore = Math.Max(bestScore, score); } return bestScore; } else { int bestScore = int.MaxValue; foreach (int move in GetPossibleMoves(board)) { int[] newBoard = MakeMove(board, move); int score = MiniMax(newBoard, depth - 1, true); bestScore = Math.Min(bestScore, score); } return bestScore; } } // Example usage: int[] board = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }; int bestMove = -1; int bestScore = int.MinValue; foreach (int move in GetPossibleMoves(board)) { int[] newBoard = MakeMove(board, move); int score = MiniMax(newBoard, depth, false); if (score > bestScore) { bestScore = score; bestMove = move; } } Console.WriteLine("Best move: " + bestMove); ``` This is a simplified example, and you would need to implement the `IsGameOver()`, `Evaluate()`, `GetPossibleMoves()`, and `MakeMove()` functions according to the rules of your specific game. The `depth` parameter controls the depth of the search tree, determining how far ahead the algorithm looks. Adjusting this parameter can affect the algorithm's performance and the quality of the decisions it makes.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值