机器学习数据拆分_解释了关键的机器学习概念-数据集拆分和随机森林

最新推荐文章于 2022-04-27 18:37:25 发布

cumichun6193

最新推荐文章于 2022-04-27 18:37:25 发布

阅读量760

点赞数

文章标签：算法决策树 python 机器学习深度学习

原文链接：https://www.freecodecamp.org/news/key-machine-learning-concepts-explained-dataset-splitting-and-random-forest/

版权

机器学习数据拆分

数据集分割 (Dataset Splitting)

Splitting up into Training, Cross Validation, and Test sets are common best practices. This allows you to tune various parameters of the algorithm without making judgements that specifically conform to training data.

分为培训，交叉验证和测试集是常见的最佳实践。这使您可以调整算法的各种参数，而无需做出专门符合训练数据的判断。

动机 (Motivation)

Dataset Splitting emerges as a necessity to eliminate bias to training data in ML algorithms. Modifying parameters of a ML algorithm to best fit the training data commonly results in an overfit algorithm that performs poorly on actual test data. For this reason, we split the dataset into multiple, discrete subsets on which we train different parameters.

数据集拆分是消除ML算法中训练数据偏差的必要条件。修改ML算法的参数以最适合训练数据通常会导致过拟合算法，该算法在实际测试数据上的表现不佳。因此，我们将数据集分为多个离散子集，在这些子集上训练不同的参数。

训练集 (The Training Set)

The Training set is used to compute the actual model your algorithm will use when exposed to new data. This dataset is typically 60%-80% of your entire available data (depending on whether or not you use a Cross Validation set).

训练集用于计算算法在暴露给新数据时将使用的实际模

最低0.47元/天解锁文章

cumichun6193

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习数据拆分_解释了关键的机器学习概念-数据集拆分和随机森林

机器学习数据拆分数据集分割 (Dataset Splitting)Splitting up into Training, Cross Validation, and Test sets are common best practices. This allows you to tune various parameters of the algorithm without making jud...
复制链接

扫一扫