自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(11)
  • 收藏
  • 关注

原创 lec_5 Policy Gradient

Policy GradientHere we consider finite horizon RL problem.θ∗=arg⁡max⁡θEτ∼pθ(τ)[∑t=1Tr(st,at)]\theta^* = \arg \max_{\theta} E_{\tau\sim p_\theta(\tau)}[ \sum_{t=1}^Tr(s_t,a_t)]θ∗=argθmax​Eτ∼pθ​(τ)​[t=1∑T​r(st​,at​)]1. Basic DerivationLet J=Eτ∼pθ(τ)[∑t

2021-11-06 18:54:13 133

原创 Zotero + PDF Reader + Typora 文献管理,阅读,注释策略

我的需求主要在电脑上阅读文献,偶尔在ipad上阅读。阅读时在文献中标出重点和疑问,并写下阅读笔记。文献下载,保存下载pdf到本地,用icould在设备间同步。在mac上,使用Zotero管理文献(分类,标记)。Typora写笔记。流程下载pdf到自定文件夹;往Zotero中导入文件link到指定分类(非store copy of file,这样可以直接通过icloud在设备间同步);在自定文件夹中创建读书笔记;往Zotero中导入笔记link到pdf相同文件夹。mac查看时直接

2020-10-18 06:34:03 3448

原创 Kinds of RL Algorithms

The following content is based on linkKinds of RL AlgorithmsTaxonomyModel-Free vs Model-Based RLThe difference is whether the agent has access to (or learns) a model of the environment. The environment means a function that predicts state transitions

2020-08-06 10:11:22 363

原创 Key Concept in RL

The following content is from link.Key Concepts in RLPoliciesRule used by an agent to decide what actions to take. It can be deterministic or stochastic.at=μ(st)orat∼π(⋅∣st)a_t=\mu(s_t) \quad \text{or}\quad a_t \sim \pi(\cdot|s_t)at​=μ(st​)orat​∼π(⋅∣

2020-08-06 09:00:39 160

原创 MAML理解

line6,each θi\theta_iθi​ is the one-step updated parameter for each task, which would not be used as parameters for testing. Instead, it is used to update θ\thetaθ, which is the final initilization that we wish to get.This way, it is asctually minimizing.

2020-08-02 04:41:10 536

原创 集合极限的理解

The idea of lim inf is to throw away any peculiarities caused by finitely many initial terms of the sequence of sets and get at what we might call the ‘essential’ intersection.下极限的是为了去掉集合序列前有限个集合所造成的不好的性质,由此关注集合序列‘主要’的并。上极限也一样,对上极限取并可以剔除前有限个集合所造成的不好的东西,只

2020-07-30 10:38:24 4227 5

原创 May 4th

PlanAutoencoder algorithmInte Stat home2think about convergenceLet 5 probablyAnalysis chap 1,2

2020-05-04 09:33:55 169 1

原创 May 3rd

Inte Stat Lec 2Markov InequalityChebyshev InequalityChernoff MethodMarkov + mgfGaussian TailSpecial mgf bound from Gaussian propertySub-GaussianBounded by normal GaussianHoeffding’s boundBo...

2020-05-04 09:29:47 405

原创 Study Plan

Terence Tao Analysis 15.4-5.20读一遍Rudin Principles of math analysis5.20-6.20 读2-4,7-10Inter Stat5.1-7.1

2020-05-04 09:14:02 133

原创 MacOS下Matlab一直初始化的解决方法(重命名至_old)——Solution to Matlab stuck in initialization(rename to _old)

概述提供了一种MacOS下,如果MATLAB_R2019b一直初始化无法操作的解决方案。关键字Matlab, stuck in initialization(初始化), rename, _old, MacOS具体实现今天MATLAB突然抽风,打开时左下角一直显示initializing。查了网上的资料,在我能找到的中文资料里面都说的是license问题。多次尝试无果,后来发现,这里里...

2020-02-12 05:41:00 856 2

原创 MacOS下MATLAB Engine API for Python的安装

MacOS下MATLAB Engine API for Python的安装关键字简介具体实现结语关键字MacOS, MATLAB Engine, Python 2.7, Python 3.6, Permission Denied,Administrator简介在Mac OS X下,对Anaconda中的Python 3.6(而非Mac OS自带的Python 2.7)安装MATLAB En...

2020-02-08 10:58:34 1257 4

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除