最適当承诺-CSDN博客

数据分析在对数据尽心建模之前，我们需要先对数据的基本特性进行分析输入数据的维度：一维；二维（是否需要卷积操作）输入数据在每个维度的范围是否归一化处理了对于图片信息，则显示图片，查看信息完整性 ( matplotlib.pyplt查看 )import matplotlib.pyplot as pltplt.figure()plt.imshow( image0 )plt.colorbar()plt.grid(False)plt.show()模型建模keras Sequent

2022-05-16 19:22:55 145

原创 Model Predictive Control

模型预测控制（model predictive contol, MPC）是在有环境模型的情况下为了找到能够实现最小成本（最大价值）的轨迹而进行的控制优化问题。模型描述我们已经知道了环境的模型： st+1=f(st,at)s_{t+1}=f(s_t,a_t)st+1=f(st,at)，我们已知初始状态s0s_0s0我们知道不同状态与动作下的奖励 r(st,at,st+1)r(s_t,a_t,s_{t+1})r(st,at,st+1)goal : 想要求得能够到达目标状态sfs_fsf

2022-03-28 21:52:12 462

原创 ubuntu的强化学习环境配置 Python, anaconda, gym, tendorflow,pytorch, baselines, mujoco + jupyter, pycharm

RL 常见环境安装

2022-03-08 01:26:24 833

原创 Neural Network Fundamental (2)

Radial-Basis Function NetworksRBF神经网络: 将输入向量与多个中心点（与输入向量相同维度）（数量可根据设计进行改变）的欧式距离进行计算，排除其他的信息。对一个中心点距离相同的点，在此神经元中可以看作相同。如果单纯将欧式距离进行输出，那么距离越近，输出越小;距离越大，输出越大。但是，我们实际上是希望越靠近中心点，值越大。因此在选择第一层神经元的激活函数时，应该选用输出与输入成反比的函数。我们选用高斯分布，距离为0时，值最大。我们可以看到 RBF网络中：

2022-03-01 12:47:38 404

原创如何规范写出 Python 的 Github 文件

目录README.mdREADME.md 基本介绍Markdown 介绍：READ.md详细介绍程序的安装Setup 打包使用setuptools工具对代码进行打包文件的结构将打包好的程序安装在电脑上README.mdREADME.md 基本介绍README.md文件是用Markdown写成的文件。README.md是为了让读者最快搞懂这个项目的价值（你做了什么），如何去使用这个项目，以及这个项目中的重要信息。因此，README.md一般包括以下几个方面：项目的整体介绍，解决了什么问题整

2022-03-01 12:46:31 196

原创算法学习(Python版)

Reference: 算法图解git 代码大O表示法查找算法-二分查找使用条件：对于有序的数组算法的时间： O(logn)O(log n)O(logn) 或者 logN+1logN + 1logN+1次 [ 大O表示法说明了算法的增速 ]import numpy as npdef binary_search( memory,desired_number): index_lower = 0 index_upper = len(memory)-1 index_chos

2022-03-01 12:44:45 75

原创 C++ 学习

C++ 语法学习

2022-02-15 17:58:28 621

原创 Neural Network Fundamentals (1)

神经网络基础

2022-02-13 23:38:14 1748

原创 Machine Vision Fundamentals (1)

I will summarize the content about basic knowledge and related MATLAB codes from ME5405 (Machine Vision)

2022-01-27 12:19:45 475

原创 Policy Gradient Methods of Deep Reinforcement Learning (Part Two)

本文将讨论分布空间的Natrual Gradient, 然后将Natural Gradient 用于Actor Critic。另外说明Trust Region Policy Optimization(TRPO) 和 Proximal Policy Optimization(PPO)算法。

2022-01-13 13:11:33 85

原创 Policy gradient Method of Deep Reinforcement learning (Part One)

本文会总结深度强化学习中基于策略学习的各种方法，从优化的角度说明REINFORCE（Monte Carlo based policy gradient）和参数空间Vanilla policy gradient的各种Actor Critic Methods.因为基于策略学习的方法因为能够在状态和动作的连续空间中进行，其在机器人控制中会经常用到，本文后续会分析策略学习的强化学习在机器人灵

2022-01-04 21:37:53 618

原创 Deep learning by Pytorch

pytorch 用法总结

2021-12-23 12:04:32 861

原创 Mujoco 基本教程

Mujoco 建模教程 + Mujoco_py构建环境

2021-12-22 10:27:32 7144

原创 Win10 MBR传统bios+单硬盘上安装 ubuntu16.04

Win10 MBR传统bios+单硬盘上安装 ubuntu16.041.ubuntu 16.04 源下载地址：下载： ubuntu-16.04.7-desktop-amd64.iso清华大学开源软件镜像站 | Tsinghua Open Source Mirror2. 准备一个存储大于2GB的格式化内存3. 下载软碟通为了刻录ubuntu刻录具体步骤：windows10安装ubuntu双系统教程（绝对史上最详细） - 不妨不妨，来日方长 - 博客园4. window上..

2021-09-15 13:53:11 990

原创 Discrete-event system

SystemThe components that could serve as a functional whole can be seen as one system.There will be the input vector u(t), the state vector x(t) and the output vector y(t) in one system.And x(t) contains essential information that can guarantee:.

2021-08-16 21:56:44 317

原创 4.1 Temporal Differential of one step

DefinitionDynamic Programming uses the euqations in the second line and has to know the environmental dynamics ( dynamics of environemntcan produce the chain between this state to the next state, but it's hard to know.Cons) ( It uses the rel...

2021-08-15 23:56:40 103

原创 3.3 Monte Carlo Methods: case study: Blackjack of Policy Improvement of on- & off-policy Evaluation

BackgroundIn3.1 Monte Carlo Methods & case study: Blackjack of Policy Evaluation, we finished the evaluation for the specific policy ( hit unless 20 or21). In this article, we will summarize the policy improvement for

2021-08-09 11:35:45 137

空空如也

空空如也