论文理解
文章平均质量分 94
机器学习相关论文详细解读
云端FFF
not because they are easy, but because they are hard
展开
-
序列模型(4)—— Scaling Laws
本文介绍 LLM 训练过程中重要的经验规律 Scaling Laws,它可以指导我们如何最大化训练效率,我们还可以借助它通过小规模实验预测大模型的性能表现原创 2024-01-10 04:40:46 · 1594 阅读 · 0 评论 -
论文速览【Offline RL】——【IQL】Offline reinforcement learning with implicit Q-Learning
【速览】标题:Offline reinforcement learning with implicit Q-Learning;发表:ICLR 2022;领域:离线强化学习(offline/batch RL)—— IL-Based原创 2023-02-06 15:05:20 · 1314 阅读 · 1 评论 -
论文速览【ML4CO】—— 【Ptr-Net】Pointer Networks
标题:Pointer Networks;发表:NIPS 2015;领域:序列模型(seq2seq)改进 / 深度学习解决组合优化问题原创 2023-09-25 20:27:46 · 338 阅读 · 2 评论 -
论文速览【Offline RL】—— 【CQL】Conservative Q-Learning for Offline Reinforcement Learning
标题:Conservative Q-Learning for Offline Reinforcement Learning;发表:NIPS 2020;领域:离线强化学习(offline/batch RL)—— RL-Based原创 2023-07-08 08:51:13 · 366 阅读 · 0 评论 -
论文速览【序列模型GPT】—— 【Transformer-XL】Attentive Language Models Beyond a Fixed-Length Context
标题:Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context;发表:ACL 2019;领域:Transformer (decoder) 改进原创 2023-06-26 22:07:03 · 470 阅读 · 6 评论 -
论文理解【Offline RL】——【One-step】Offline RL Without Off-Policy Evaluation
标题:Offline RL Without Off-Policy Evaluation;发表:NIPS 2021;领域:离线强化学习(offline/batch RL)—— RL-Based / One-step原创 2023-01-30 03:38:29 · 748 阅读 · 0 评论 -
论文理解【Offline RL】——【BooT】Bootstrapped Transformer for Offline Reinforcement Learning
标题:Bootstrapped Transformer for Offline Reinforcement Learning;发表:NIPS 2022;领域:离线强化学习(offline/batch RL)—— Transformer-Based / 数据增强原创 2023-01-12 18:34:23 · 385 阅读 · 0 评论 -
论文理解【Offline RL】——【TT】Offline Reinforcement Learning as One Big Sequence Modeling Problem
标题:Offline Reinforcement Learning as One Big Sequence Modeling Problem;发表:NIPS 2021;领域:离线强化学习(offline/batch RL)—— Transformer-Based / Model-Based原创 2023-01-08 00:01:04 · 635 阅读 · 7 评论 -
论文理解【Offline RL】——【DT】Decision Transformer: Reinforcement Learning via Sequence Modeling
标题:Decision Transformer: Reinforcement Learning via Sequence Modeling;发表:NIPS 2021;领域:离线强化学习(offline/batch RL)—— Transformer Based / Hindsight 监督思想原创 2022-12-23 04:01:24 · 1436 阅读 · 0 评论 -
论文理解【Offline RL】——【RvS】What is Essential for Offline RL via Supervised Learning?
RvS: What is Essential for Offline RL via Supervised Learning?;ICLR 2022;离线强化学习(offline/batch RL)—— Hindsight 监督思想原创 2022-12-12 13:48:45 · 609 阅读 · 0 评论 -
论文理解【Offline RL】——【BCQ】Off-Policy Deep Reinforcement Learning without Exploration
Off-Policy Deep Reinforcement Learning without Exploration;ICML 2019;离线强化学习(offline/batch RL)—— RL-Based 策略约束原创 2022-12-08 17:59:05 · 734 阅读 · 0 评论 -
论文理解【Offline RL】—— A dataset perspective on offline reinforcement learning
标题:A dataset perspective on offline reinforcement learning;发表:NIPS 2021 Workshop;领域:Offline RL —— 数据集分析原创 2022-10-18 16:13:13 · 847 阅读 · 0 评论 -
论文理解【RL经典】—— 【SQL】Reinforcement Learning with Deep Energy-Based Policies
标题:Reinforcement Learning with Deep Energy-Based Policies;发表:ICML 2017;领域:强化学习经典(Model-free + 最大熵思想),这篇是 SAC 的前身原创 2022-10-13 17:51:26 · 1168 阅读 · 0 评论 -
论文理解【RL - Exploration】—— 【Go-Explore】First return, then explore
标题:First return, then explore;发表:Nature 2021;领域:强化学习 —— Exploration原创 2022-09-01 11:39:12 · 1184 阅读 · 2 评论 -
论文理解【RL - MARL】—— 【CoPO】Learning to Simulate SDP System with Coordinated Policy Optimization
标题:Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization;发表:NIPS 2021;领域:强化学习 —— Multi-Agent原创 2022-08-22 22:33:01 · 1272 阅读 · 0 评论 -
论文理解【RL - Exp Replay】—— 【ReMERN & ReMERT】Regret Minimization Exp Replay in Off-Policy RL
标题:Regret Minimization Experience Replay in Off-Policy Reinforcement Learning;发表:NIPS 2021;领域:强化学习 —— experience replay原创 2022-08-19 20:47:12 · 447 阅读 · 0 评论 -
论文理解【RL - Exp Replay】—— 【LFIW】Experience Replay with Likelihood-free Importance Weights
标题:Experience Replay with Likelihood-free Importance Weights;发表:PMLR 2022;领域:强化学习 —— Experience Replay原创 2022-08-01 10:41:49 · 463 阅读 · 0 评论 -
论文理解【RL - Exp Replay】—— 【DisCor】Corrective Feedback in RL via Distribution Correction
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction;发表于 NIPS 2020;强化学习 Experience Replay 领域原创 2022-08-13 04:18:55 · 514 阅读 · 2 评论 -
论文理解【RL - Exp Replay】—— An Equivalence between Loss Functions and Non-Uniform Sampling in Exp Replay
标题:An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay;发表:NIPS 2020;领域:强化学习 —— Replay Buffer原创 2022-05-25 08:51:45 · 411 阅读 · 0 评论 -
论文理解【RL经典】 —— 【DQN】Human-level control through deep reinforcement learning
标题:Human-level control through deep reinforcement learning发表:Nature 2015领域:强化学习经典(DQN系列)原创 2022-04-13 11:17:08 · 6213 阅读 · 4 评论 -
论文理解【RL - Exp Replay】 —— 【PER】Prioritized Experience Replay
标题:Prioritized Experience Replay文章链接:Curriculum Offline Imitating Learning发表:ICLR 2016领域:强化学习 —— Replay Buffer原创 2022-03-29 14:41:46 · 1972 阅读 · 0 评论 -
论文理解【Offline RL】 —— 【COIL】Curriculum Offline Imitating Learning
标题:Curriculum Offline Imitating Learning发表:NIPS 2021领域:离线强化学习(offline/batch RL)—— IL-based 方法原创 2022-01-16 04:36:56 · 2208 阅读 · 0 评论 -
论文理解【Offline RL】—— 【BAIL】Best-Action Imitation Learning for Batch Deep Reinforcement Learning
论文理解 —— BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning原创 2022-01-01 12:37:55 · 1086 阅读 · 3 评论 -
论文理解【IL - IRL】 —— Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferences 模仿学习 —— 逆向强化学习原创 2021-12-08 22:59:49 · 2342 阅读 · 3 评论 -
论文理解【RL - Episodic Control】 ——【MFEC】Model Free Episodic Control
论文理解 —— Model Free Episodic Control(强化学习 - 情节控制)原创 2021-11-26 21:25:30 · 3550 阅读 · 4 评论 -
论文理解【RL - 解释性】 —— Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions
论文理解 —— Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions原创 2021-11-07 11:20:09 · 657 阅读 · 6 评论 -
论文理解【IL - BC】—— End to End Learning for Self-Driving Cars
论文理解 —— End to End Learning for Self-Driving Cars原创 2021-09-26 16:57:27 · 876 阅读 · 0 评论 -
论文理解【IL - BC】—— An autonomous land vehicle in a neural network
标题:An autonomous land vehicle in a neural network发表:Advances in Neural Information Processing Systems, 1989 (NIPS)领域:IL-BC原创 2021-09-22 20:10:12 · 692 阅读 · 0 评论 -
论文理解【IL - 数据增广】 —— Adversarial Imitation Learning with Trajectorial Augmentation and Correction
标题:Adversarial Imitation Learning with Trajectorial Augmentation and Correction发表:ICRA 2021领域:模仿学习 - 轨迹级数据增强原创 2021-09-21 20:06:54 · 1169 阅读 · 4 评论 -
【组会论文记录】2021/3/24(CReST、SELF、SelNLPL、Class-Balanced Loss、Solve PDE with DNN)
仅记录组会上同学分享文章的idea,大部分我没有仔细读过,仅供参考原创 2021-03-26 20:37:47 · 698 阅读 · 0 评论 -
【组会论文记录】2021/3/31(episodic control RL)
本系列文章意在记录组会上同学分享文章的idea,大部分我没有仔细读过,仅供参考本周三篇文章《Model-Free Episodic Control》《Episodic Memory Deep Q-Networks》《Episodic Reinforcement Learning with Associative Memory》这几篇都是有关强化学习中 episodic control 的内容,利用非参数化的memory来保存一些好的经验进行学习,可以有效解决强化学习中价值传递太慢的问题。其.原创 2021-04-07 18:39:35 · 363 阅读 · 0 评论