[AI数据]数据集分割的3种方式

最新推荐文章于 2023-12-18 08:15:44 发布

guaguastd

最新推荐文章于 2023-12-18 08:15:44 发布

阅读量1.8k

点赞数 1

文章标签：人工智能计算机视觉

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/guaguastd/article/details/107605545

版权

1.np.random.permutation法
import numpy as np
def split_train_test(data, test_ratio):
    shuffled_indices = np.random.permutation(len(data))
    test_set_size = int(len(data) * test_ratio)
    test_indices = shuffled_indices[:test_set_size]
    train_indices = shuffled_indices[test_set_size:]
    return data.iloc[train_indices], data.iloc[test_indices]

train_set, test_set = split_train_test(housing, 0.2)
print('len_train_set:', len(train_set))
print('len_test_set:', len(test_set))

#输出
len_train_set: 16512
len_test_set: 4128

缺点:
a.数据集分割在变化
b.使用np.random.seed(42)可以保持数据集的分隔,但是数据集变化后,分割变化

2.crc32法
from zlib import crc32

def test_set_check(identifier, test_ratio):
return crc32(np.int64(identifier)) & 0xffffffff < test_ratio * 2**32
def split_train_test_by_id(data, test_ratio,

最低0.47元/天解锁文章

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
[AI数据]数据集分割的3种方式

1.np.random.permutation法import numpy as npdef split_train_test(data, test_ratio): shuffled_indices = np.random.permutation(len(data)) test_set_size = int(len(data) * test_ratio) test_indices = shuffled_indices[:test_set_size] train_indice...
复制链接

扫一扫

guaguastd CSDN认证博客专家 CSDN认证企业博客

码龄12年

133: 原创

3万+: 周排名

165万+: 总排名

97万+: 访问

: 等级

1万+: 积分

212: 粉丝

82: 获赞

35: 评论

256: 收藏

私信

关注

热门文章

分类专栏

最新评论

Python 实现简单的加减算数游戏
华泽小勇: 如何加界面呢
[视觉工程]以图搜图之搜索策略(bf,kdTree,ballTree,annoy,nms,falconn)
韩国麦当劳: 大佬，您好，我想问一下你的falconn是怎么装的？我用pip安装老是报错 [code=plain] Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting FALCONN Using cached https://pypi.tuna.tsinghua.edu.cn/packages/96/b8/0d2c629d59398a7b3ed8726ce049abf6746bbf09d1ad15878d4fcf8048a6/FALCONN-1.3.1.tar.gz (1.4 MB) Preparing metadata (setup.py) ... done Building wheels for collected packages: FALCONN Building wheel for FALCONN (setup.py) ... error error: subprocess-exited-with-error × python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [17 lines of output] running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.7 creating build\lib.win-amd64-3.7\falconn copying falconn\__init__.py -> build\lib.win-amd64-3.7\falconn running egg_info writing FALCONN.egg-info\PKG-INFO writing dependency_links to FALCONN.egg-info\dependency_links.txt writing top-level names to FALCONN.egg-i [/code]
[GAN实战] DCGAN实现
weixin_53799925: 请问网络深度对gan有什么影响？如果使用一些卷积网络里的module会对gan有比较大的作用么
Python 使用递归打印输出数字（逆序和顺序）
豆汁泡纳豆: 醍醐灌顶
Python 正则表达式将纯文本转化为HTML格式
Tisfy: 正想看这样的文章，就遇到了它

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。