自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+

阿正的梦工坊

时间的朋友

  • 博客(1394)
  • 资源 (6)
  • 收藏
  • 关注

原创 深入理解:卡方分布(Chi-squared distribution)与伽马分布(Gamma Distribution)的关系

伽马分布是一个通用的分布,广泛用于描述随机变量的总和;卡方分布是伽马分布的特例,专注于正态变量平方和的情形;

2024-11-30 11:53:38 888

原创 威沙特分布(Wishart Distribution)和伽马分布(Gamma Distribution)的关系和使用场景

伽马分布描述的是一维随机变量的分布,而威沙特分布则描述协方差矩阵的分布。它们的联系可以看作是一种从标量到矩阵的推广过程。

2024-11-30 11:44:51 760

原创 架构师的英文:Architect

This role typically requires deep technical expertise and the ability to think systematically, focusing on designing and optimizing software systems at a high level.

2024-11-30 10:37:45 598

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 43

Most Common American Idioms: Part 43

2024-11-30 10:32:59 834

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 42

Most Common American Idioms: Part 42

2024-11-30 09:44:27 502

原创 大模型推理显存计算:为什么激活值显存可以忽略不计?(中英双语)

Inference in large language models (LLMs), such as LLaMA 2 7B, involves two primary components of GPU memory consumption: model parameter memory and activation memory

2024-11-30 09:40:31 532

原创 SGD、RMSProp 和 Adam 优化器的区别及训练显存消耗分析:以LLaMA-2 7B为例(中英双语)

A Detailed Analysis of SGD, RMSProp, and Adam Optimizers, and Their Memory Consumption

2024-11-30 09:06:31 622

原创 Adam与RMSProp优化器的区别以及训练显存消耗(以LLaMA-2 7B为例):中英双语

Both aim to improve training efficiency by adapting learning rates for each parameter, but they do so in different ways.

2024-11-30 09:05:38 809

原创 Adam 和 AdamW 优化器详解及其训练显存需求分析:以LLaMA-2 7B为例(中英双语)

Detailed Analysis of Adam and AdamW Optimizers and Their Memory Consumption with float32 and bfloat16 Precision

2024-11-30 09:03:46 968

原创 深入了解 Adam 优化器对显存的需求:以 LLaMA-2 7B 模型为例 (中英双语)

Understanding the Additional Memory Requirements of Adam Optimizer: Memory Consumption Breakdown for Model Parameters and Optimizer States

2024-11-30 09:02:14 1212

原创 bfloat16(BF16)和 float16(FP16)有什么区别?中英双语解释

BF16 offers a larger numerical range and is specifically optimized for deep learning tasks that require handling large gradients and weights.

2024-11-29 16:46:17 959

原创 数据并行、模型并行与张量并行:深度学习中的并行计算策略(中英双语)

Data Parallelism, Model Parallelism, and Tensor Parallelism: Parallel Computing Strategies in Deep Learning

2024-11-29 15:33:59 902

原创 DeepSpeed 的 hybrid_engine 参数详解:中英双语

By enabling and configuring the hybrid computation engine, DeepSpeed can intelligently manage memory and computation across multiple devices, improving efficiency and reducing training time.

2024-11-29 15:12:17 969

原创 深入了解 DeepSpeed 的 nebula_config 参数:中英双语介绍

This parameter allows users to manage and optimize the storage and version control of training states, facilitating efficient data storage and recovery during model training.

2024-11-29 15:03:08 1123

原创 课程学习 (Curriculum Learning) 介绍及其在 DeepSpeed 框架中的应用:中英双语

Curriculum learning aims to improve the learning efficiency of models by gradually increasing the difficulty of the training tasks.

2024-11-29 14:44:19 1053

原创 DeepSpeed框架配置解析:一份详细的日志分析

这些配置项涵盖了内存优化、自动调优、混合精度、分布式训练等多个方面,以及模型训练的其他细节方面,包括压缩、梯度处理、优化器配置、数据效率、流水线并行等

2024-11-29 14:17:28 813

原创 如何在 DeepSpeed 中开启梯度检查点(gradient checkpointing):中英双语介绍

Gradient checkpointing in DeepSpeed is a technique designed to reduce memory usage when training large models by storing only a subset of intermediate activations during the forward pass.

2024-11-29 13:48:50 804

原创 梯度检查点技术(Gradient Checkpointing)详细介绍:中英双语

By discarding intermediate activations and recomputing them when needed, gradient checkpointing reduces memory usage, making it feasible to train large models on memory-limited hardware.

2024-11-29 13:42:06 921

原创 大模型训练train_micro_batch_size_per_gpu 开得小,gradient_accumulation_steps 开得也小会怎样?

为了避免显存不足,可以通过减少 微批次大小 和 梯度累积步数、使用 混合精度训练、或应用 梯度检查点 等技术来优化显存的使用。

2024-11-29 13:30:15 874

原创 什么是分布式梯度累积(Distributed Gradient Accumulation)?gradient_accumulation_steps参数如何设置?

Distributed gradient accumulation helps overcome memory limitations by allowing us to simulate larger batch sizes while using smaller mini-batches.

2024-11-29 13:18:01 693

原创 DeepSpeed配置文件reduce_bucket_size参数详解:中英双语

reduce_bucket_size is an essential parameter in DeepSpeed's ZeRO Stage 2 optimization, controlling the size of the buckets during gradient reduction.

2024-11-29 13:09:00 690

原创 梯度规约(gradient reduction)是什么?中英双语解释

By understanding the mechanics of gradient reduction and the impact of contiguous memory, we can optimize distributed training setups and improve model training efficiency across multiple devices.

2024-11-29 12:51:04 585

原创 如何从 Hugging Face 数据集中随机采样数据并保存为新的 Arrow 文件

dataset_info.json文件记得更改

2024-11-29 12:36:20 937

原创 理解Parquet文件和Arrow格式:从Hugging Face数据集的角度出发

Understanding Parquet Files and Arrow Format: A Guide with Hugging Face Datasets

2024-11-29 12:12:54 978

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 41

Most Common American Idioms: Part 41

2024-11-29 11:55:39 794

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 40

Most Common American Idioms: Part 40

2024-11-28 21:49:15 885

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 39

Most Common American Idioms: Part 39

2024-11-28 21:24:24 898

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 38

Most Common American Idioms: Part 38

2024-11-28 20:34:52 883

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 37

Most Common American Idioms: Part 37

2024-11-28 19:54:44 707

原创 ChatGPT眼中詹姆斯(LeBron James)的口头禅

Lebron James of NBA

2024-11-28 18:17:53 763

原创 贝叶斯统计:高斯分布均值μ的后验分布推导

在贝叶斯统计中,后验分布表示在观察到数据后,对参数的更新后的信念。

2024-11-28 15:56:18 1029

原创 似然分布(Likelihood Distribution)和似然函数(Likelihood Function)有什么区别?中英双语

Both involve the probability of data given a set of parameters, but they are used in different contexts and have distinct meanings.

2024-11-28 15:52:56 799

原创 贝叶斯统计的核心思想与基础知识:中英双语

Bayesian statistics is a framework that uses Bayes' theorem to update our beliefs about parameters or models based on observed data.

2024-11-28 15:28:48 894

原创 对max_seq_length参数的理解,基于open-instruct框架:中英文解释

It determines the maximum input sequence length (in tokens) that the model can process in a single forward pass after tokenization.

2024-11-28 14:42:32 749

原创 DeepSpeed配置文件contiguous_gradients参数详解:中英文

It controls whether the gradients are stored in a contiguous memory block.

2024-11-28 14:15:50 816

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 36

Most Common American Idioms: Part 36

2024-11-28 13:09:18 764

原创 跟李笑来学美式俚语(Most Common American Idioms): Part 35

Most Common American Idioms: Part 35

2024-11-28 12:51:45 844

原创 DeepSpeed 配置文件(DeepSpeed Configuration Files)详解:中英文解释

DeepSpeed’s configuration is highly flexible, but tuning requires balancing memory efficiency and computational speed.

2024-11-27 22:08:58 1276

原创 英伟达GPU通信用的NCCL库是什么?中英双语介绍

NCCL (NVIDIA Collective Communications Library) is a high-performance communication library developed by NVIDIA.

2024-11-27 21:40:53 1064

原创 中英双语介绍DeepSpeed 的 ZeRO 优化

DeepSpeed introduces the ZeRO (Zero Redundancy Optimizer) optimization technique, a groundbreaking solution to reduce memory usage and improve efficiency during training.

2024-11-27 21:17:14 925

李永乐线代强化笔记2020年.rar

李老师对出题形式、考试重点了如指掌,解题思路极其灵活,辅导针对性极强,效果优良,成绩显著,受到广大学员的交口称赞!这是笔者自己的笔记,整理成pdf版,方便大家复习使用。

2020-10-27

李永乐线代基础班笔记.zip

李永乐线性代数基础班笔记2020年。用过了都说好!好在思路与题型的延伸方面。举一反三(举一反N也不夸张)

2020-09-13

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除