- 博客(1395)
- 资源 (6)
- 收藏
- 关注
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 54
Most Common American Idioms: Part 54
2024-12-08 21:55:54 576
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 53
Most Common American Idioms: Part 53
2024-12-05 15:14:10 513
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 52
Most Common American Idioms: Part 52
2024-12-04 21:41:40 472
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 51
Most Common American Idioms: Part 51
2024-12-04 16:31:15 667
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 50
Most Common American Idioms: Part 50
2024-12-04 15:24:12 661
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 49
Most Common American Idioms: Part 49
2024-12-03 16:40:46 600
原创 深入解析 Loss 减少方式:mean和sum的区别及其在大语言模型中的应用 (中英双语)
Use sum for tasks that prioritize long-sequence performance, such as chat models or long-text generation.
2024-12-03 16:06:48 1144
原创 open-instruct框架使用:从某个checkpoint开始继续训练,那么之前的数据集还要从头开始训练吗?
finetune.py 脚本支持从 checkpoint 恢复训练,并且在加载模型权重时还会恢复训练的步数(steps) 和 epoch 信息
2024-12-03 15:46:49 548
原创 如何在模型训练时避免计算 Padding Token 的 Loss
根据 transformers 文档,如果没有显式指定 label_pad_token_id,它通常会默认使用 -100,因为这是 CrossEntropyLoss 的 ignore_index 默认值。
2024-12-03 14:06:12 848
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 48
Most Common American Idioms: Part 48
2024-12-02 21:44:01 642
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 47
Most Common American Idioms: Part 47
2024-12-02 20:18:25 826
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 46
Most Common American Idioms: Part 46
2024-12-02 18:52:56 699
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 45
Most Common American Idioms: Part 45
2024-12-02 18:31:53 851
原创 跟李笑来学美式俚语(Most Common American Idioms): Part 44
Most Common American Idioms: Part 44
2024-12-02 17:41:55 831
原创 如何用python统计未截断的tokenization的长度分布:基于tulu3数据集和gemma-2-2b的tokenizer
统计tokenize之后长度超过 1024 的样本数量
2024-12-02 16:05:32 201
原创 Wishart分布与高斯协方差矩阵的贝叶斯推断: Wishart Distribution and Gaussian Covariance Matrix in Bayesian Inference
This example demonstrates how to infer the covariance matrix of a Gaussian distribution in a Bayesian framework, highlighting the role of the Wishart distribution as a conjugate prior.
2024-12-02 15:24:25 353
原创 多维高斯分布均值向量的推断: 一个经典的共轭分布应用案例 (中英双语)
This article demonstrated the Bayesian inference process for the mean vector of a multivariate Gaussian distribution and highlighted the importance of conjugate priors.
2024-12-02 15:03:17 620
原创 以狄利克雷分布与多项分布(Dirichlet Distribution and Multinomial Distribution)为例解释共轭分布:中英双语
The conjugate relationship between the Dirichlet distribution and multinomial distribution provides a simple and efficient Bayesian inference framework for modeling categorical probabilities.
2024-12-02 14:39:28 590
原创 Beta分布与二项分布的共轭关系详解: the Conjugacy Between Beta Distribution and Binomial Distribution
In Bayesian statistics, the Beta distribution and Binomial distribution form one of the most classic conjugate prior-posterior pairs.
2024-12-02 14:09:55 510
原创 共轭分布(Conjugate Distribution)和共轭先验(Conjugate Prior):简化贝叶斯推断的利器(中英双语)
A conjugate distribution is a type of prior distribution that, when combined with a specific likelihood function, results in a posterior distribution that has the same form as the prior.
2024-12-02 13:44:39 1676
原创 频率学派(Frequentism)和 贝叶斯学派(Bayesianism):两种概率观的比较与机器学习中的应用 (中英双语)
The Bayesian approach provides a flexible and systematic way to incorporate prior knowledge and beliefs, which can be incredibly useful when data is scarce or uncertain.
2024-12-02 13:15:38 772
原创 open-instruct框架tokenization超时不要设置NCCL_TIMEOUT,而是要设置timeout参数
Useful if tokenization process is long. Default is 1800 seconds (30 minutes)
2024-12-02 12:15:00 436
原创 指数族分布(Exponential Family Distribution)详解及其应用: 中英双语
This article introduces the background, properties, and real-world applications of exponential family distributions, illustrated through examples such as univariate and multivariate Gaussian distributions, Dirichlet distribution, and Wishart distribution.
2024-12-01 22:40:22 845
原创 最大似然估计:求解指数族分布的参数 ( η) 具有封闭解 (中英双语)
In exponential family distributions, due to the linear relationship between the log-likelihood function and the sufficient statistic, Maximum Likelihood Estimation (MLE) often has a closed-form solution.
2024-12-01 22:25:57 771
原创 支持向量机(SVM)的解析与应用:从封闭解到时代演变 (中英双语)
Thanks to the closed-form expression of the kernel matrix, the gradient of the objective function in the dual form of SVM can be computed efficiently. This avoids the need for complicated numerical optimization procedures
2024-12-01 22:03:27 902
原创 封闭解(Closed-Form Solution)与复杂数值优化(Complex Numerical Optimization)的比较:中英双语
Closed-form solutions play a vital role in simplifying and solving problems in mathematics, statistics, and machine learning. They provide precise results with minimal computational cost, making them indispensable for problems where they are applicable.
2024-12-01 21:37:30 496
原创 充分统计量(Sufficient Statistic)概念与应用: 中英双语
A sufficient statistic is a function of a dataset that captures all the information about a parameter of interest contained within the data.
2024-12-01 20:32:25 1072
原创 指数族分布(Exponential Family of Distributions)的两种形式及其区别
In statistics, the Exponential Family of Distributions is a widely used and mathematically convenient class of distributions that includes many common ones, such as the normal, binomial, and Poisson distributions.
2024-12-01 20:20:29 887
原创 深入理解学生氏分布(Student‘s t-Distribution)以及柯西分布、高斯分布与学生氏分布的关系
t分布最初由英国统计学家戈塞特(William Sealy Gosset)提出,目的是为了解决小样本情况下的统计推断问题。它的长尾特性允许分布对离群点更加鲁棒,成为了处理小样本数据、异常值和不确定性问题的重要工具。
2024-12-01 19:46:37 941
原创 高斯-威沙特分布(Gaussian-Wishart Distribution)详解:基于多维高斯分布(Multivariate Gaussian Distribution)中英双语
A Detailed Guide to the Gaussian-Wishart Distribution: Background, Application, and Mathematical Formulation
2024-12-01 19:19:26 800
原创 Meta-Llama-3-8B-Instruct 模型的混合精度训练显存需求:AdamW优化器(中英双语)
In-Depth Analysis of Memory Requirements for Mixed Precision Training of Meta-Llama-3-8B-Instruct Model
2024-12-01 14:47:50 1380
原创 为什么混合精度训练中优化器参数仍然以 FP32 存储?LLaMA 2 7B 模型在混合精度下的显存需求
混合精度训练通过 BF16 格式大幅减少显存需求,但关键的优化器参数(权重更新副本、一阶动量、二阶动量)仍然以 FP32 存储,保证数值稳定性和训练精度
2024-12-01 14:34:07 887
原创 如何计算训练中的 Steps 数量:基于DeepSpeed实际训练配置的详细解析
In deep learning model training, a "step" refers to a single update of the model's parameters after processing a batch of training samples.
2024-12-01 12:36:19 724
原创 使用 LLaMA 进行文本生成任务的 SFT(监督微调)训练
监督微调(Supervised Fine-Tuning, SFT)是指在一个已经经过预训练的大规模语言模型的基础上,使用标注数据进行进一步的训练,使其在某个特定任务上表现得更好。
2024-12-01 11:31:52 664
原创 高斯-伽玛分布(Gaussian-Gamma Distribution):在均值和方差都未知时的贝叶斯推断 (中英双语)
Gaussian-Gamma Distribution: Bayesian Inference with Unknown Mean and Variance
2024-12-01 11:31:00 596
原创 偏差-方差权衡(Bias–Variance Tradeoff):理解监督学习中的核心问题
模型的误差并非只有一个来源,而是可以分解为三部分:不可约误差(Irreducible Error)、偏差(Bias) 和 方差(Variance)。
2024-11-30 15:43:50 1162
原创 什么是有参模型(Parametric Model)和非参模型(Non-parametric Model)?
有参模型通过有限的参数数量来定义模型,而非参模型通过数据本身或无固定数量的参数来描述问题。
2024-11-30 15:25:40 756
原创 多维高斯分布(Multivariate Gaussian Distribution)以及协方差矩阵:解析与应用
协方差矩阵的值:决定了马氏距离的尺度和方向敏感性。分布形状的建模:通过调整协方差矩阵,可以控制分布的拉伸和旋转,以更精确地拟合数据。
2024-11-30 15:11:17 1128
原创 NLP中的主题模型:LDA(Latent Dirichlet Allocation, 潜在狄利克雷分配)
主题模型是一种用于发现文档集合中潜在主题的概率生成模型。其中,LDA(Latent Dirichlet Allocation, 潜在狄利克雷分配)是最著名的主题模型之一。在 LDA 中,狄利克雷分布起到了核心作用,用于建模文档-主题分布和主题-单词分布。
2024-11-30 13:37:26 1243
李永乐线代强化笔记2020年.rar
2020-10-27
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人