14小时近500 Star!快速进阶LLM/AI的必读系列

 Datawhale分享 

必读论文:LLM/AI,编辑:深度学习自然语言处理

项目地址:https://github.com/InterviewReady/ai-engineering-resources

Tokenization 分词处理

  • Byte-pair Encoding
    https://arxiv.org/pdf/1508.07909

  • Byte Latent Transformer: Patches Scale Better Than Tokens
    https://arxiv.org/pdf/2412.09871

Vectorization 向量化处理

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    https://arxiv.org/pdf/1810.04805

  • IMAGEBIND: One Embedding Space To Bind Them All
    https://arxiv.org/pdf/2305.05665

  • SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
    https://arxiv.org/pdf/2308.11466

  • FAISS library
    https://arxiv.org/pdf/2401.08281

  • Facebook Large Concept Models
    https://arxiv.org/pdf/2412.08821v2

Infrastructure 基础设施

  • TensorFlow
    https://arxiv.org/pdf/1605.08695

  • Deepseek filesystem
    https://github.com/deepseek-ai/3FS/blob/main/docs/design_notes.md

  • Milvus DB
    https://www.cs.purdue.edu/homes/csjgwang/pubs/SIGMOD21_Milvus.pdf

  • Billion Scale Similarity Search : FAISS
    https://arxiv.org/pdf/1702.08734

  • Ray
    https://arxiv.org/abs/1712.05889

Core Architecture 核心架构

  • Attention is All You Need
    https://papers.neurips.cc/paper/7181-attention-is-all-you-need.pdf

  • FlashAttention
    https://arxiv.org/pdf/2205.14135

  • Multi Query Attention
    https://arxiv.org/pdf/1911.02150

  • Grouped Query Attention
    https://arxiv.org/pdf/2305.13245

  • Google Titans outperform Transformers
    https://arxiv.org/pdf/2501.00663

  • VideoRoPE: Rotary Position Embedding
    https://arxiv.org/pdf/2502.05173

Mixture of Experts 专家混合模型

  • Sparsely-Gated Mixture-of-Experts Layer
    https://arxiv.org/pdf/1701.06538

  • GShard
    https://arxiv.org/abs/2006.16668

  • Switch Transformers
    https://arxiv.org/abs/2101.03961

RLHF 基于人类反馈的强化学习

  • Deep Reinforcement Learning with Human Feedback
    https://arxiv.org/pdf/1706.03741

  • Fine-Tuning Language Models with RHLF
    https://arxiv.org/pdf/1909.08593

  • Training language models with RHLF
    https://arxiv.org/pdf/2203.02155

Chain of Thought 思维链

  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
    https://arxiv.org/pdf/2201.11903

  • Chain of thought
    https://arxiv.org/pdf/2411.14405v1/

  • Demystifying Long Chain-of-Thought Reasoning in LLMs
    https://arxiv.org/pdf/2502.03373

Reasoning 推理

  • Transformer Reasoning Capabilities
    https://arxiv.org/pdf/2405.18512

  • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
    https://arxiv.org/pdf/2407.21787

  • Scale model test times is better than scaling parameters
    https://arxiv.org/pdf/2408.03314

  • Training Large Language Models to Reason in a Continuous Latent Space
    https://arxiv.org/pdf/2412.06769

  • DeepSeek R1
    https://arxiv.org/pdf/2501.12948v1

  • A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
    https://arxiv.org/pdf/2502.01618

  • Latent Reasoning: A Recurrent Depth Approach
    https://arxiv.org/pdf/2502.05171

  • Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
    https://arxiv.org/pdf/2504.13139

Optimizations 优化方案

  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
    https://arxiv.org/pdf/2402.17764

  • FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
    https://arxiv.org/pdf/2407.08608

  • ByteDance 1.58
    https://arxiv.org/pdf/2412.18653v1

  • Transformer Square
    https://arxiv.org/pdf/2501.06252

  • Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
    https://arxiv.org/pdf/2501.09732

  • 1b outperforms 405b
    https://arxiv.org/pdf/2502.06703

  • Speculative Decoding
    https://arxiv.org/pdf/2211.17192

Distillation 蒸馏

  • Distilling the Knowledge in a Neural Network
    https://arxiv.org/pdf/1503.02531

  • BYOL - Distilled Architecture
    https://arxiv.org/pdf/2006.07733

  • DINO
    https://arxiv.org/pdf/2104.14294

SSMs 状态空间模型

  • RWKV: Reinventing RNNs for the Transformer Era
    https://arxiv.org/pdf/2305.13048

  • Mamba
    https://arxiv.org/pdf/2312.00752

  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    https://arxiv.org/pdf/2405.21060

  • Distilling Transformers to SSMs
    https://arxiv.org/pdf/2408.10189

  • LoLCATs: On Low-Rank Linearizing of Large Language Models
    https://arxiv.org/pdf/2410.10254

  • Think Slow, Fast
    https://arxiv.org/pdf/2502.20339

Competition Models 竞赛模型

  • Google Math Olympiad 2
    https://arxiv.org/pdf/2502.03544

  • Competitive Programming with Large Reasoning Models
    https://arxiv.org/pdf/2502.06807

  • Google Math Olympiad 1
    https://www.nature.com/articles/s41586-023-06747-5

Hype Makers

  • Can AI be made to think critically
    https://arxiv.org/pdf/2501.04682

  • Evolving Deeper LLM Thinking
    https://arxiv.org/pdf/2501.09891

  • LLMs Can Easily Learn to Reason from Demonstrations Structure
    https://arxiv.org/pdf/2502.07374

Hype Breakers

  • Separating communication from intelligence
    https://arxiv.org/pdf/2301.06627

  • Language is not intelligence
    https://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf

Image Transformers 图像转换器

  • Image is 16x16 word
    https://arxiv.org/pdf/2010.11929

  • CLIP
    https://arxiv.org/pdf/2103.00020

  • deepseek image generation
    https://arxiv.org/pdf/2501.17811

Video Transformers 视频转换器

  • ViViT: A Video Vision Transformer
    https://arxiv.org/pdf/2103.15691

  • Joint Embedding abstractions with self-supervised video masks
    https://arxiv.org/pdf/2404.08471

  • Facebook VideoJAM ai gen
    https://arxiv.org/pdf/2502.02492

Case Studies 案例分析

  • Automated Unit Test Improvement using Large Language Models at Meta
    https://arxiv.org/pdf/2402.09171

  • Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
    https://arxiv.org/pdf/2404.17723v1

  • OpenAI o1 System Card
    https://arxiv.org/pdf/2412.16720

  • LLM-powered bug catchers
    https://arxiv.org/pdf/2501.12862

  • Chain-of-Retrieval Augmented Generation
    https://arxiv.org/pdf/2501.14342

  • Swiggy Search
    https://bytes.swiggy.com/improving-search-relevance-in-hyperlocal-food-delivery-using-small-language-models-ecda2acc24e6

  • Swarm by OpenAI
    https://github.com/openai/swarm

  • Netflix Foundation Models
    https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39

  • Model Context Protocol
    https://www.anthropic.com/news/model-context-protocol

  • uber queryGPT
    https://www.uber.com/en-IN/blog/query-gpt/

图片

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值