自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(473)
  • 资源 (1)
  • 收藏
  • 关注

原创 [Arxiv 2024] Self-Rewarding Language Models

[Arxiv 2024] Self-Rewarding Language Models

2024-08-28 11:42:02 1034

原创 [NeurIPS 2024] Self-Refine: Iterative Refinement with Self-Feedback

[NeurIPS 2024] Self-Refine: Iterative Refinement with Self-Feedback

2024-08-25 10:57:32 287

原创 [ACL 2024] Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

[ACL 2024] Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

2024-08-23 00:46:47 728

原创 [ACL 2024] Revisiting Knowledge Distillation for Autoregressive Language Models

[ACL 2024] Revisiting Knowledge Distillation for Autoregressive Language Models

2024-08-21 10:55:25 763

原创 [Arxiv 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Spec Dec

[Arxiv 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Spec Dec

2024-08-05 15:59:28 301

原创 [Arxiv 2024] EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

[Arxiv 2024] EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

2024-08-05 15:09:40 618

原创 [ICLR 2024] On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

[ICLR 2024] On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

2024-08-04 21:40:55 956

原创 [ACL 2023] Distilling Step-by-Step! Outperforming LLMs with Less Data and Smaller Model

[ACL 2023] Distilling Step-by-Step! Outperforming LLMs with Less Data and Smaller Model

2024-08-04 11:31:14 839

原创 [NeurIPS 2022] Chain-of-thought prompting elicits reasoning in large language models

[NeurIPS 2022] Chain-of-thought prompting elicits reasoning in large language models

2024-08-04 09:54:14 168

原创 Multi-Head Latent Attention: Boosting Inference Efficiency

Multi-Head Latent Attention: Boosting Inference Efficiency

2024-08-01 16:17:48 910

原创 LLM Preference Alignment (PPO, DPO, SimPO)

LLM Preference Alignment (PPO, DPO, SimPO)

2024-08-01 11:18:05 717

原创 Introduction to Deep Reinforcement Learning (Policy Gradient, Actor-Critic, PPO)

Introduction to Deep Reinforcement Learning (Policy Gradient, Actor-Critic, PPO)

2024-07-30 10:48:51 987

原创 Hybrid LLM Parallelism

Notes on Distributed LLM Inference

2024-07-08 14:03:15 620

原创 Different decoding methods for LLMs

Different decoding methods for LLMs

2024-07-03 23:53:27 667

原创 Introduction to popular LLM components

Introduction to popular LLM components

2024-06-05 16:51:43 599

原创 [NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

[NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2024-05-29 15:47:34 1037

原创 一个小技巧轻松提升量化精度!IntactKV:保持关键词元无损的大语言模型量化方法

本文介绍我们针对大语言模型量化的工作 IntactKV,可以作为插件有效提升 GPTQ、AWQ、QuaRot 等现有主流量化方法效果。论文作者来自清华大学、华为诺亚、中科院自动化所和香港中文大学。论文代码已经开源,欢迎大家使用!

2024-05-29 15:07:29 1128

原创 [SC 2020] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

[SC 2020] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

2024-05-10 14:52:27 633

原创 [Blog 2023] Flash-Decoding for long-context inference

[Blog 2023] Flash-Decoding for long-context inference

2024-05-07 21:08:31 480

原创 [ICLR 2024] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

[ICLR 2024] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

2024-05-07 18:32:36 702

原创 Transformer 位置编码

Transformer 位置编码

2024-03-18 17:19:28 780

原创 [Arxiv 2023] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

[Arxiv 2023] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

2023-11-19 15:47:18 36

原创 [Arxiv 2019] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

[Arxiv 2019] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

2023-09-27 14:35:36 509 1

原创 [ICLR 2023] LPT: Long-tailed Prompt Tuning for Image Classification

[ICLR 2023] LPT: Long-tailed Prompt Tuning for Image Classification

2023-08-21 16:00:20 1869

原创 PyTorch 测量代码段的运行时间

PyTorch 测量代码段的运行时间

2023-08-01 15:15:56 1378

原创 [NeurIPS 2019] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

[NeurIPS 2019] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

2023-06-03 19:55:38 56

原创 [ECCV 2022] VL-LTR: Learning Class-wise Visual-Linguistic Representation for LTR

[ECCV 2022] VL-LTR: Learning Class-wise Visual-Linguistic Representation for LTR

2023-05-29 20:44:22 357

原创 [NeurIPS 2022] Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

[NeurIPS 2022] Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

2023-05-11 16:29:11 193 1

原创 [CVPR 2023] HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization

[CVPR 2023] HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization

2023-05-05 10:52:55 333

原创 [NeurIPS 2019] Hyperspherical Prototype Networks

[NeurIPS 2019] Hyperspherical Prototype Networks

2023-03-28 23:29:48 225

原创 [Arxiv 2022] InstructGPT: Training language models to follow instructions with human feedback

[Arxiv 2022] InstructGPT: Training language models to follow instructions with human feedback

2023-03-25 00:46:33 215

原创 [Arxiv 2022] HIRL: A General Framework for Hierarchical Image Representation Learning

[Arxiv 2022] HIRL: A General Framework for Hierarchical Image Representation Learning

2023-03-12 16:10:08 131

原创 [CVPR 2022] HCSC: hierarchical contrastive selective coding

[CVPR 2022] HCSC: hierarchical contrastive selective coding

2023-03-12 14:39:18 205

原创 [NIPS 2017] Improved Training of Wasserstein GANs (WGAN-GP)

[NIPS 2017] Improved Training of Wasserstein GANs

2023-03-11 14:06:06 617 4

原创 [ICML 2017] Wasserstein Generative Adversarial Networks (WGAN)

[ICML 2017] Wasserstein Generative Adversarial Networks (WGAN)

2023-03-11 09:02:45 752

原创 [ICLR 2016] Unsupervised representation learning with DCGANs

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks目录IntroductionApproach and Model ArchitectureIntroductionMotivation and Contributions(1) Representation learning from unlabled dataLearning reusable.

2023-03-10 20:36:10 676

原创 [Arxiv 2023] Hyperbolic Contrastive Learning

[Arxiv 2023] Hyperbolic Contrastive Learning

2023-03-06 09:28:04 399

原创 [CVPR 2022] Balanced Contrastive Learning for Long-Tailed Visual Recognition

[CVPR 2022] Balanced Contrastive Learning for Long-Tailed Visual Recognition

2023-03-04 20:32:49 1291

原创 [NeurIPS 2020] Supervised Contrastive Learning

[NeurIPS 2020] Supervised Contrastive Learning

2023-03-02 16:42:59 838

原创 [ACM MM 2021] RAMS-Trans: Recurrent Attention Multi-scale Transformer for FGVC

[ACM MM 2021] RAMS-Trans: Recurrent Attention Multi-scale Transformer for FGVC

2023-02-28 09:35:11 293

软件加密解密.rar

对应于我博客“汇编语言(十):软件加密与解密”中所需的资源。包括 ollydbg, PW32Dasm9b.EXE, CRACKME.EXE, Task Lock.EXE

2021-02-07

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除