连理o-CSDN博客

原创 [Arxiv 2025] O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

[Arxiv 2025] O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

2025-05-07 09:35:45 964

原创 [Arxiv 2025] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

[Arxiv 2025] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

2025-04-29 21:00:40 837

原创 Overleaf 论文提交 Arxiv

Overleaf 论文提交 Arxiv

2025-04-07 16:07:46 623 2

原创 [Arxiv 2025] Rethinking Layer Removal: Preserving Critical Components with Task-Aware SVD

[Arxiv 2025] Rethinking Layer Removal: Preserving Critical Components with Task-Aware SVD

2025-01-03 20:47:57 765

原创 [Arxiv 2024] ProcessBench: Identifying Process Errors in Mathematical Reasoning

[Arxiv 2024] ProcessBench: Identifying Process Errors in Mathematical Reasoning

2024-12-23 10:16:43 385

原创 [ACL 2024] ReFT: Reasoning with REinforced Fine-Tuning

[ACL 2024] ReFT: Reasoning with REinforced Fine-Tuning

2024-12-09 23:27:28 656

原创 [Arxiv 2024] Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM‘s Reasoning

[Arxiv 2024] Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning

2024-12-09 17:29:54 1132

原创 [Arxiv 2024] Subtle Errors Matter: Preference Learning via Error-injected Self-editing

[Arxiv 2024] Subtle Errors Matter: Preference Learning via Error-injected Self-editing

2024-12-09 16:29:51 999

原创 [Arxiv 2024] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

[Arxiv 2024] Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

2024-12-09 15:37:26 721

原创 [Arxiv 2024] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

[Arxiv 2024] Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

2024-12-04 21:51:47 813

原创 [Arxiv 2024] rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

[Arxiv 2024] rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

2024-12-04 20:54:24 1177

原创 [NeurIPS 2022] Leveraging Inter-Layer Dependency for Post-Training Quantization

[NeurIPS 2022] Leveraging Inter-Layer Dependency for Post-Training Quantization

2024-11-28 23:31:42 659

原创 LLM 量化新篇章，4-bit 权重激活量化几乎无损！FlatQuant 的平坦之道

本文介绍来自华为诺亚方舟实验室、清华大学和香港中文大学联合在大语言模型量化上的最新工作 **FlatQuant (Fast and Learnable Affine Transformation)**。FlatQuant 通过为每个线性层适配轻量的可学习的仿射变换，有效平滑 LLM 离群值，得到更加平坦的权重和激活值分布，有效降低量化损失。相比此前的量化方法 [1][2]，本方法首次在 **LLaMA-3-70B 上达到 W4A4

2024-10-22 19:08:00 2352

原创 [Arxiv 2024] PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

[Arxiv 2024] PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

2024-10-17 19:43:13 1289

原创 [Arxiv 2024] Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs

[Arxiv 2024] Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs

2024-10-17 13:19:42 302

原创 [NeurIPSW 2024] Self-Data Distillation for Recovering Quality in Pruned Large Language Models

[NeurIPSW 2024] Self-Data Distillation for Recovering Quality in Pruned Large Language Models

2024-10-17 10:29:38 485

原创 [NeurIPS 2022] STaR: Bootstrapping Reasoning With Reasoning

[NeurIPS 2022] STaR: Bootstrapping Reasoning With Reasoning

2024-10-05 21:03:33 1056

原创 [ICLR 2024] Let‘s Verify Step by Step

[ICLR 2024] Let's Verify Step by Step

2024-10-05 10:03:30 1020

原创 [Arxiv 2024] Self-Rewarding Language Models

[Arxiv 2024] Self-Rewarding Language Models

2024-08-28 11:42:02 1210

原创 [NeurIPS 2024] Self-Refine: Iterative Refinement with Self-Feedback

[NeurIPS 2024] Self-Refine: Iterative Refinement with Self-Feedback

2024-08-25 10:57:32 405

原创 [ACL 2024] Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

[ACL 2024] Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

2024-08-23 00:46:47 1118

原创 [ACL 2024] Revisiting Knowledge Distillation for Autoregressive Language Models

[ACL 2024] Revisiting Knowledge Distillation for Autoregressive Language Models

2024-08-21 10:55:25 913

原创 [Arxiv 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Spec Dec

[Arxiv 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Spec Dec

2024-08-05 15:59:28 401

原创 [Arxiv 2024] EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

[Arxiv 2024] EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

2024-08-05 15:09:40 827

原创 [ICLR 2024] On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

[ICLR 2024] On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

2024-08-04 21:40:55 1355

原创 [ACL 2023] Distilling Step-by-Step! Outperforming LLMs with Less Data and Smaller Model

[ACL 2023] Distilling Step-by-Step! Outperforming LLMs with Less Data and Smaller Model

2024-08-04 11:31:14 942

原创 [NeurIPS 2022] Chain-of-thought prompting elicits reasoning in large language models

[NeurIPS 2022] Chain-of-thought prompting elicits reasoning in large language models

2024-08-04 09:54:14 296

原创 Multi-Head Latent Attention: Boosting Inference Efficiency

Multi-Head Latent Attention: Boosting Inference Efficiency

2024-08-01 16:17:48 3237

原创 LLM Preference Alignment (PPO, DPO, SimPO, GRPO)

LLM Preference Alignment (PPO, DPO, SimPO)

2024-08-01 11:18:05 1701

原创 Introduction to Deep Reinforcement Learning (Policy Gradient, Actor-Critic, PPO)

Introduction to Deep Reinforcement Learning (Policy Gradient, Actor-Critic, PPO)

2024-07-30 10:48:51 1172

原创 Hybrid LLM Parallelism

Notes on Distributed LLM Inference

2024-07-08 14:03:15 804

原创 Different decoding methods for LLMs

Different decoding methods for LLMs

2024-07-03 23:53:27 721

原创 Introduction to popular LLM components

Introduction to popular LLM components

2024-06-05 16:51:43 677

原创 [ICLR 2025] SpinQuant: LLM Quantization with Learned Rotations

[Arxiv 2024] SpinQuant: LLM Quantization with Learned Rotations

2024-05-29 18:48:03 1062

原创 [NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

[NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2024-05-29 15:47:34 1209

原创一个小技巧轻松提升量化精度！IntactKV：保持关键词元无损的大语言模型量化方法

本文介绍我们针对大语言模型量化的工作 IntactKV，可以作为插件有效提升 GPTQ、AWQ、QuaRot 等现有主流量化方法效果。论文作者来自清华大学、华为诺亚、中科院自动化所和香港中文大学。论文代码已经开源，欢迎大家使用！

2024-05-29 15:07:29 1518

原创 [SC 2020] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

[SC 2020] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

2024-05-10 14:52:27 840

原创 [Blog 2023] Flash-Decoding for long-context inference

[Blog 2023] Flash-Decoding for long-context inference

2024-05-07 21:08:31 601

原创 [ICLR 2024] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

[ICLR 2024] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

2024-05-07 18:32:36 889

原创 Transformer 位置编码

Transformer 位置编码

2024-03-18 17:19:28 870

软件加密解密.rar

空空如也