tf
安達と島村
学习记录用,可能有错误的地方
展开
-
td3玩Pendulum(连续动作)
参考https://blog.csdn.net/november_chopin/article/details/108171234和https://zhuanlan.zhihu.com/p/111334500ddpg的升级版,升级的地方如下:双 Critic 网络延迟 Actor 网络更新目标策略的平滑正则化 Target Policy SmoothingRegularizationimport tensorflow as tffrom tensorflow import kerasim原创 2021-09-14 14:44:36 · 403 阅读 · 0 评论 -
ppo玩Pendulum(连续动作)
import tensorflow as tffrom tensorflow import kerasfrom keras.layers import *import numpy as npimport gymnp.random.seed(1)tf.random.set_seed(1)EP_MAX = 500BATCH = 32EP_LEN = 200GAMMA = 0.9A_LR = 0.0001C_LR = 0.0002A_UPDATE_STEPS = 10C_UPDA原创 2021-09-13 17:43:18 · 1742 阅读 · 0 评论 -
dppo玩cartpole(离散动作)
https://github.com/hitgub123/rl参考git,多线程的代码基本没看明白。import tensorflow as tffrom tensorflow import kerasfrom keras.layers import *import numpy as npimport gym, threading, queuenp.random.seed(1)tf.random.set_seed(1)EP_MAX = 1000EP_LEN = 500N_WORKE原创 2021-09-12 19:19:24 · 433 阅读 · 0 评论 -
ppo玩cartpole(离散动作)
https://github.com/hitgub123/rlimport tensorflow as tffrom tensorflow import kerasfrom keras.layers import *import numpy as npimport gymnp.random.seed(1)tf.random.set_seed(1)EP_MAX = 1000EP_LEN = 500GAMMA = 0.9 # reward discount factorA_LR =原创 2021-09-12 19:16:29 · 3079 阅读 · 7 评论 -
keras fit的sample_weight
sample_weight对每一个sample的梯度,乘以对应的weightimport tensorflow as tffrom tensorflow import kerasfrom keras.layers import *import numpy as nptf.random.set_seed(1)np.random.seed(1)x_input = Input(shape=(None, 1))y_input = Input(shape=(None, 1))ki = kera原创 2021-09-08 13:56:50 · 1141 阅读 · 0 评论 -
反向传播的只有一个参数的demo
矩阵求导不会算import tensorflow as tfimport numpy as npfrom tensorflow import kerasfrom keras.layers import *tf.random.set_seed(1)np.random.seed(1)x=Input((1,))out=Dense(1)(x)out=Dense(1)(out)m=keras.models.Model(x,out)opt = tf.compat.v1.train.Gradie原创 2021-09-02 21:30:19 · 63 阅读 · 0 评论 -
梯度下降demo
loss对参数(权重)求偏导,再用偏导*学习率更新参数import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npfrom tensorflow import kerastf2.random.set_seed(1)sess = tf.Session()np.random.seed(1)x_input = tf.placeholder(tf.floa原创 2021-09-02 10:36:02 · 103 阅读 · 0 评论 -
a3c玩cartpole
import multiprocessingimport threadingimport tensorflow as tffrom tensorflow import kerasfrom keras.layers import *import numpy as npimport gymnp.random.seed(1)tf.random.set_seed(1)GAME = 'CartPole-v0'N_WORKERS = multiprocessing.cpu_count()MAX原创 2021-08-29 23:03:43 · 190 阅读 · 0 评论 -
keras使用apply_gradients进行训练
不需要session,和tf1一样使用apply_gradients进行训练。在for循环里训练,预测结果和这篇的两个for的结果一样。直接train(x_train,y_train)训练,结果和其他所有结果不一样。import tensorflow as tffrom tensorflow import kerasfrom keras.layers import *import numpy as nptf.random.set_seed(1)np.random.seed(1)x_inp原创 2021-08-29 13:24:01 · 1214 阅读 · 0 评论 -
关于tf.stop_gradient的使用及理解
参考https://www.jianshu.com/p/f893cb703b6b和关于tf.stop_gradient的使用及理解通过self.q_target = tf.stop_gradient(q_target),将原本为TensorFlow计算图中的一个op(节点)转为一个常量self.q_target,这时候对于loss的求导反传就不会传到target net去了。import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.di原创 2021-08-28 15:34:51 · 1490 阅读 · 1 评论 -
tf.gradients()和grad_ys的作用
完全参考这篇import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as nptf2.random.set_seed(1)np.random.seed(1)w1 = tf.get_variable('w1', shape=[2])w2 = tf.get_variable('w2', shape=[2])w3 = tf.get_variable('w3'转载 2021-08-27 23:03:21 · 399 阅读 · 0 评论 -
tf1替换minimize的2种写法以及发现与keras训练结果有差异
不熟悉tf1,记录一下import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as nptf2.random.set_seed(1)sess = tf.Session()np.random.seed(1)x_input = tf.placeholder(tf.float32, name='x_input')y_input = tf.placeholder原创 2021-08-27 13:28:04 · 305 阅读 · 0 评论 -
DDPG玩Pendulum-v0
参考莫烦和Keras深度强化学习–DPG与DDPG实现,代码critic的输出越大,说明actor选择的行为越好,所以可以把critic的输出的负值当loss。actor有两个网络,ae接受当前状态s,计算当前行为a,并执行a,at接受下个状态s_,计算下个行为a_,传给ct,通过最大化q(最小化-q)更新aecritic有两个网络,ce接受当前状态s和当前行为a,计算当前价值q,ct接受下个状态s_和下个行为a_,计算下个价值q_,使用 v_*gama+r 和 v 更新ceimport原创 2021-08-24 20:51:41 · 765 阅读 · 0 评论 -
actor critic玩cartpole
只能玩到reward=200多,gitimport gym, numpy as npimport tensorflow as tffrom tensorflow import kerasnp.random.seed(1)tf.random.set_seed(1)class Actor: def __init__(self,n_actions,n_features,learning_rate=1e-3): self.n_actions = n_actions原创 2021-08-22 17:14:49 · 377 阅读 · 0 评论 -
policy gradient玩cartpole(2)
能对多个模型输入使用2种loss。gitimport gym,numpy as npimport tensorflow as tf,timefrom tensorflow.keras import *np.random.seed(1)tf.random.set_seed(1)#def my_loss(y_true, y_pred): neg_log_prob=losses.categorical_crossentropy(y_true,y_pred) # y_true已经原创 2021-08-21 21:26:47 · 149 阅读 · 0 评论 -
policy gradient玩cartpole(1)
代码地址git能对多个模型输入使用3种loss,但每次都要传多个参数(没用的可以dummy)。达到一定reward后会不停的失败,需要提前结束。import gym, numpy as npimport tensorflow as tffrom tensorflow import kerasnp.random.seed(1)tf.random.set_seed(1)'''policy_gradient_cartpole_v1:能对多个模型输入使用3种loss,但每次都要传多个参数(没用原创 2021-08-21 21:25:01 · 191 阅读 · 0 评论 -
policy gradient flappy bird(不收敛)
from tensorflow import kerasimport numpy as np, cv2, sys, tensorflow as tfsys.path.append("game/")import dqn_flappy_bird.game.wrapped_flappy_bird as gameimport warningswarnings.filterwarnings('ignore')# train_on_batch + tensorboard 用def named_l原创 2021-08-15 18:02:08 · 300 阅读 · 0 评论 -
Prioritized_Replay_dqn玩flappybird
上一篇的权重有问题,这里用add_loss和自定义loss解决了。但模型不收敛,暂时解决不了,先留个记录。git见上篇from tensorflow import kerasimport numpy as np, time, cv2, sys, tensorflow as tf, mathsys.path.append("game/")import dqn_flappy_bird.game.wrapped_flappy_bird as game# train_on_batch + tens原创 2021-08-13 17:41:45 · 165 阅读 · 0 评论 -
关于tf.reduce_mean的参数keep_dims
import tensorflow as timport numpy as na=n.array([[1,2],[3,4]])'''array([[1, 2], [3, 4]])'''t.reduce_mean(a,axis=1)'''<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 3])>'''a=n.array([[1.,2.],[3.,4.]])'''array([[1., 2.],原创 2021-08-11 13:51:32 · 418 阅读 · 0 评论 -
Dueling_DQN玩flappybird(使用Model+各种运算构建网络)
11原创 2021-08-11 13:29:36 · 165 阅读 · 0 评论 -
Prioritized_Replay_dqn玩flappybird(有bug)
参考git和让dqn变得更会学习。代码基于dqn玩flappybird做了如下修改:使用两个网络(eval和target,非ddqn)删除了success和fail的memory使用SumTree进行重要性的相关操作重要性体现在两个地方,更重要的记忆更可能被选中,且更新的幅度更大。莫凡的代码里loss=tf.reduce_mean(self.ISWeights * tf.squared_difference(self.q_target, self.q_eval)),貌似没有现成的loss函数原创 2021-08-10 23:35:16 · 244 阅读 · 0 评论 -
dqn玩flappybird(使用了keras/tensorboard)
参考https://github.com/yenchenlin/DeepLearningFlappyBird改成keras写的,跑了几万步都没学到任何东西,就自己改了些参数。FRAME_PER_ACTION=2,每两帧choose_action,其余帧不采取行动。learn_step=5,每5步学习一次。推测学不到东西是因为reward=1的记忆几乎没有,加了success memory。(也许根本不用加)大概学习(15h)10w次后能过几个木桩。(貌似要几百万次效果才好)然后改了FRAME_P原创 2021-08-05 16:18:43 · 386 阅读 · 0 评论 -
dqn解简易迷宫
参考用keras搭建DQN和莫凡的git#RL_brain_diy.pyfrom tensorflow import kerasimport numpy as np,timeclass DeepQNetwork: def __init__( self, n_actions, n_features, is_train=True, load_model=True,原创 2021-07-28 16:29:55 · 379 阅读 · 0 评论 -
win32api系列键盘鼠标操作:路尼亚pvp
流程:遍历所有窗口,通过标题查找游戏窗口获取所有游戏窗口的句柄和坐标按窗口从左到右排序句柄和坐标所有窗口(除了第一个)点击准备,输入验证码后,第一个窗口点开始游戏运行一段时间,后一半投降重复以上步骤一些要点:需要用SetForegroundWindow激活窗口验证码识别见简易数字识别:路尼亚pvp验证码import win32api, win32con, time, cv2,win32guiimport lunia.pictool as lp, lunia.constant a原创 2021-07-08 00:39:07 · 345 阅读 · 0 评论 -
简易数字识别:路尼亚pvp验证码
做好二值化的数字模板,shape都是4*9对要识别图像转灰度图,二值化各自图像形态处理,去噪(此项目不需要)findContours找轮廓分割数字(此项目不需要)把轮廓内图形resize成模板大小(也可以用卷积网络)遍历,matchTemplate,取最大值import cv2import numpy as npclass NumberPicTool: templatePadding=0 threshold_=130 numberPics = [] .原创 2021-07-07 23:03:56 · 180 阅读 · 0 评论 -
tf2.keras神经网络mnist手写数字识别(BatchNormalization+Dropout)
import numpy as np,tensorflow as tffrom tensorflow.keras.datasets import mnistfrom tensorflow import kerasdef create_model(): model = keras.Sequential() # Adds a densely-connected layer with 64 units to the model: # model.add(keras.layers.D原创 2021-06-22 12:10:23 · 242 阅读 · 0 评论 -
tf卷积神经网络CNN进行mnist手写数字识别,dense,conv2d,batch_normalization
对本文内容做了简化处理和增加了batch_normalization的处理。batch_normalization可以更有效的在各层间传递数据,加速训练平稳收敛。tf.nn.max_pool(value, ksize, strides, padding, name=None),参考此文第一个参数value:需要池化的输入,一般池化层接在卷积层后面,所以输入通常是feature map,依然是[batch,height, width, channels]这样的shape第二个参数ksize:池化窗口原创 2021-05-25 13:13:17 · 422 阅读 · 0 评论 -
自编码器AutoEncoder之MNIST数据降维
import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npimport matplotlib.pyplot as pltfrom tensorflow.keras.datasets import mnistfrom mpl_toolkits.mplot3d import Axes3Dfrom matplotlib import cm# Hyper Pa原创 2021-05-24 23:41:26 · 982 阅读 · 0 评论 -
自编码器AutoEncoder之MNIST数据压缩
import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npimport matplotlib.pyplot as pltfrom tensorflow.keras.datasets import mnist# Hyper Parameterslearn_rate=1e-3batch_size=128size1=256size2=32pic_siz原创 2021-05-24 22:17:25 · 340 阅读 · 0 评论 -
LSTM预测cosX
根据sin(X)的图像,预测cos(X)。有些地方看懂,记录一下,原地址https://github.com/MorvanZhou/Tensorflow-Tutorial/tree/master/tutorial-contentsimport tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npimport matplotlib.pyplot as plt#原创 2021-05-24 16:22:01 · 175 阅读 · 0 评论 -
tf循环神经网络RNN进行mnist手写数字识别2
教程地址带2个隐藏层的LSTMimport tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npfrom tensorflow.keras.datasets import mnistimport matplotlib.pyplot as plt# Hyper ParametersBATCH_SIZE = 128N_STEPS = 28N_INPUTS =原创 2021-05-23 23:16:36 · 124 阅读 · 0 评论 -
tf循环神经网络RNN进行mnist手写数字识别
参考Tensorflow——实现递归神经网络RNNimport tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npfrom tensorflow.keras.datasets import mnistimport matplotlib.pyplot as plt# Hyper ParametersBATCH_SIZE = 100N_STEPS = 28N原创 2021-05-23 22:54:39 · 239 阅读 · 0 评论 -
tf2卷积神经网络CNN进行mnist手写数字识别,模型的保存与加载,dropout
import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npfrom tensorflow.keras.datasets import mnistimport matplotlib.pyplot as pltdef weight_variable(shape): inital=tf2.random.truncated_normal(shape,std原创 2021-05-21 12:22:02 · 379 阅读 · 0 评论 -
tf2单层神经网络mnist手写数字识别
tf2的mnist的xtrain和xtest取值范围0~255,需要除以255,否则y_pre全是nan。import tensorflow.compat.v1 as tfimport tensorflow as tf2tf.disable_v2_behavior()import numpy as npfrom tensorflow.keras.datasets import mnistimport matplotlib.pyplot as pltdef add_layer(inputs,i原创 2021-05-20 11:36:58 · 361 阅读 · 0 评论