LLM
文章平均质量分 93
大模型知识、技术积累。https://github.com/shizhengLi/DeepLearning.AI
阿正的梦工坊
岁月不饶人,我亦未曾饶过岁月
展开
-
深入解析 Loss 减少方式:mean和sum的区别及其在大语言模型中的应用 (中英双语)
Use sum for tasks that prioritize long-sequence performance, such as chat models or long-text generation.原创 2024-12-03 16:06:48 · 755 阅读 · 0 评论 -
open-instruct框架使用:从某个checkpoint开始继续训练,那么之前的数据集还要从头开始训练吗?
finetune.py 脚本支持从 checkpoint 恢复训练,并且在加载模型权重时还会恢复训练的步数(steps) 和 epoch 信息原创 2024-12-03 15:46:49 · 536 阅读 · 0 评论 -
如何在模型训练时避免计算 Padding Token 的 Loss
根据 transformers 文档,如果没有显式指定 label_pad_token_id,它通常会默认使用 -100,因为这是 CrossEntropyLoss 的 ignore_index 默认值。原创 2024-12-03 14:06:12 · 840 阅读 · 0 评论 -
如何用python统计未截断的tokenization的长度分布:基于tulu3数据集和gemma-2-2b的tokenizer
统计tokenize之后长度超过 1024 的样本数量原创 2024-12-02 16:05:32 · 195 阅读 · 0 评论 -
open-instruct框架tokenization超时不要设置NCCL_TIMEOUT,而是要设置timeout参数
Useful if tokenization process is long. Default is 1800 seconds (30 minutes)原创 2024-12-02 12:15:00 · 429 阅读 · 0 评论 -
Meta-Llama-3-8B-Instruct 模型的混合精度训练显存需求:AdamW优化器(中英双语)
In-Depth Analysis of Memory Requirements for Mixed Precision Training of Meta-Llama-3-8B-Instruct Model原创 2024-12-01 14:47:50 · 1338 阅读 · 0 评论 -
为什么混合精度训练中优化器参数仍然以 FP32 存储?LLaMA 2 7B 模型在混合精度下的显存需求
混合精度训练通过 BF16 格式大幅减少显存需求,但关键的优化器参数(权重更新副本、一阶动量、二阶动量)仍然以 FP32 存储,保证数值稳定性和训练精度原创 2024-12-01 14:34:07 · 829 阅读 · 0 评论 -
如何计算训练中的 Steps 数量:基于DeepSpeed实际训练配置的详细解析
In deep learning model training, a "step" refers to a single update of the model's parameters after processing a batch of training samples.原创 2024-12-01 12:36:19 · 716 阅读 · 0 评论 -
使用 LLaMA 进行文本生成任务的 SFT(监督微调)训练
监督微调(Supervised Fine-Tuning, SFT)是指在一个已经经过预训练的大规模语言模型的基础上,使用标注数据进行进一步的训练,使其在某个特定任务上表现得更好。原创 2024-12-01 11:31:52 · 661 阅读 · 0 评论 -
监督微调SFT(Supervised Fine-Tuning)简介
在 SFT训练中,我们通常会使用有标签的训练数据进行微调。原创 2024-12-01 11:31:25 · 913 阅读 · 0 评论 -
什么是有参模型(Parametric Model)和非参模型(Non-parametric Model)?
有参模型通过有限的参数数量来定义模型,而非参模型通过数据本身或无固定数量的参数来描述问题。原创 2024-11-30 15:25:40 · 756 阅读 · 0 评论 -
大模型推理显存计算:为什么激活值显存可以忽略不计?(中英双语)
Inference in large language models (LLMs), such as LLaMA 2 7B, involves two primary components of GPU memory consumption: model parameter memory and activation memory原创 2024-11-30 09:40:31 · 532 阅读 · 0 评论 -
SGD、RMSProp 和 Adam 优化器的区别及训练显存消耗分析:以LLaMA-2 7B为例(中英双语)
A Detailed Analysis of SGD, RMSProp, and Adam Optimizers, and Their Memory Consumption原创 2024-11-30 09:06:31 · 622 阅读 · 0 评论 -
Adam与RMSProp优化器的区别以及训练显存消耗(以LLaMA-2 7B为例):中英双语
Both aim to improve training efficiency by adapting learning rates for each parameter, but they do so in different ways.原创 2024-11-30 09:05:38 · 809 阅读 · 0 评论 -
Adam 和 AdamW 优化器详解及其训练显存需求分析:以LLaMA-2 7B为例(中英双语)
Detailed Analysis of Adam and AdamW Optimizers and Their Memory Consumption with float32 and bfloat16 Precision原创 2024-11-30 09:03:46 · 968 阅读 · 0 评论 -
深入了解 Adam 优化器对显存的需求:以 LLaMA-2 7B 模型为例 (中英双语)
Understanding the Additional Memory Requirements of Adam Optimizer: Memory Consumption Breakdown for Model Parameters and Optimizer States原创 2024-11-30 09:02:14 · 1212 阅读 · 0 评论 -
bfloat16(BF16)和 float16(FP16)有什么区别?中英双语解释
BF16 offers a larger numerical range and is specifically optimized for deep learning tasks that require handling large gradients and weights.原创 2024-11-29 16:46:17 · 959 阅读 · 0 评论 -
数据并行、模型并行与张量并行:深度学习中的并行计算策略(中英双语)
Data Parallelism, Model Parallelism, and Tensor Parallelism: Parallel Computing Strategies in Deep Learning原创 2024-11-29 15:33:59 · 902 阅读 · 0 评论 -
深入了解 DeepSpeed 的 nebula_config 参数:中英双语介绍
This parameter allows users to manage and optimize the storage and version control of training states, facilitating efficient data storage and recovery during model training.原创 2024-11-29 15:03:08 · 1123 阅读 · 0 评论 -
DeepSpeed框架配置解析:一份详细的日志分析
这些配置项涵盖了内存优化、自动调优、混合精度、分布式训练等多个方面,以及模型训练的其他细节方面,包括压缩、梯度处理、优化器配置、数据效率、流水线并行等原创 2024-11-29 14:17:28 · 813 阅读 · 0 评论 -
DeepSpeed配置文件reduce_bucket_size参数详解:中英双语
reduce_bucket_size is an essential parameter in DeepSpeed's ZeRO Stage 2 optimization, controlling the size of the buckets during gradient reduction.原创 2024-11-29 13:09:00 · 690 阅读 · 0 评论 -
梯度规约(gradient reduction)是什么?中英双语解释
By understanding the mechanics of gradient reduction and the impact of contiguous memory, we can optimize distributed training setups and improve model training efficiency across multiple devices.原创 2024-11-29 12:51:04 · 585 阅读 · 0 评论 -
如何从 Hugging Face 数据集中随机采样数据并保存为新的 Arrow 文件
dataset_info.json文件记得更改原创 2024-11-29 12:36:20 · 937 阅读 · 0 评论 -
对max_seq_length参数的理解,基于open-instruct框架:中英文解释
It determines the maximum input sequence length (in tokens) that the model can process in a single forward pass after tokenization.原创 2024-11-28 14:42:32 · 749 阅读 · 0 评论 -
DeepSpeed配置文件contiguous_gradients参数详解:中英文
It controls whether the gradients are stored in a contiguous memory block.原创 2024-11-28 14:15:50 · 816 阅读 · 0 评论 -
DeepSpeed 配置文件(DeepSpeed Configuration Files)详解:中英文解释
DeepSpeed’s configuration is highly flexible, but tuning requires balancing memory efficiency and computational speed.原创 2024-11-27 22:08:58 · 1276 阅读 · 0 评论 -
英伟达GPU通信用的NCCL库是什么?中英双语介绍
NCCL (NVIDIA Collective Communications Library) is a high-performance communication library developed by NVIDIA.原创 2024-11-27 21:40:53 · 1064 阅读 · 0 评论 -
中英双语介绍DeepSpeed 的 ZeRO 优化
DeepSpeed introduces the ZeRO (Zero Redundancy Optimizer) optimization technique, a groundbreaking solution to reduce memory usage and improve efficiency during training.原创 2024-11-27 21:17:14 · 925 阅读 · 0 评论 -
open-instruct框架使用记录:只使用huggingface数据集的小部分进行训练,如何修改dataset_info.json文件
训模型的经验原创 2024-11-27 12:17:15 · 1231 阅读 · 0 评论 -
DeepLearning.AI课程:从代码层面理解预训练大语言模型(Pretraining LLMs)
Pretraining involves teaching an LLM to predict the next token using vast text datasets, resulting in a base model, and this base model requires further fine-tuning for optimal performance and safety.原创 2024-08-15 17:46:09 · 1234 阅读 · 0 评论 -
基于Neo4j将知识图谱用于检索增强生成:Knowledge Graphs for RAG
Write advanced Cypher queries to retrieve relevant information from the graph and format it for inclusion in your prompt to an LLM.原创 2024-07-24 18:26:43 · 1752 阅读 · 0 评论 -
HuggingFace团队亲授大模型量化基础: Quantization Fundamentals with Hugging Face
Quantization techniques原创 2024-06-08 15:06:09 · 1420 阅读 · 0 评论 -
使用AutoGen框架进行多智能体协作:AI Agentic Design Patterns with AutoGen
Build and customize multi-agent systems原创 2024-06-07 19:14:35 · 978 阅读 · 0 评论 -
使用亚马逊 Bedrock:Serverless LLM apps with Amazon Bedrock
The Amazon Titan model原创 2024-06-06 15:29:16 · 1416 阅读 · 0 评论 -
基于Weaviate构建多模态检索和多模态检索增强(RAG): Building Multimodal Search and RAG
Multiple data modalities原创 2024-06-02 21:00:49 · 1068 阅读 · 0 评论 -
Mistral大模型:Getting Started With Mistral
Mistral原创 2024-06-02 14:43:32 · 1062 阅读 · 0 评论 -
使用Gradio构建大模型应用:Building Generative AI Applications with Gradio
Gradio Framework原创 2024-06-01 19:58:41 · 1029 阅读 · 0 评论 -
基于高通公司AI Hub Models的On-Device AI学习:Introduction to On-Device AI
AI Hub Models of Qualcomm原创 2024-05-22 15:26:50 · 1281 阅读 · 0 评论 -
使用crewAI构建多智能体系统:Multi AI Agent Systems with crewAI
CrewAI for multi agents原创 2024-05-20 21:54:41 · 1341 阅读 · 1 评论 -
Functions, Tools and Agents with LangChain
DeepLearning.AI的short course原创 2024-03-20 17:57:40 · 871 阅读 · 0 评论
分享