人工智能相关数据集分享(一)

NLP语料库数据集

1.2016-2019新闻联播语料库(11.3MB)

2.中国对联数据集(28.2MB)

3.1998人民日报标注语料库(PFR)(10.2MB)

5.人民日报文章数据集(1979-2010)(811.9MB)

6.人民日报文章数据集(1949-1978)(559.4MB)

7.中文新闻数据集(70.3MB)

8.耶鲁文本转SQL语句挑战数据集(95.1MB)

9.新加坡国立大学SMS语料库(23.4MB)

10.中文经典典籍语料

11.非正式汉语数据集(214.5MB)

12.维基百科中文语料库(518.7MB)

13.频率最高的9933个最常用汉字数据集(1.0MB)

14.聊天语料库数据集(210.7MB)

15.短文本分类数据集(13.1MB)

16.成语阅读理解数据集(195.8MB)

17.论文自动评分数据集(78.8MB)

18.翻译语料(595.9MB)

19.中文科学文献摘要数据集(92.9MB)

20.维基百科英文语料库(89.0MB)

21.Lord of the Rings指环王数据(223.9KB)

22.中文机器阅读理解的跨度提取数据集(CMRC 2018)

23.36氪新闻数据集(42.5MB)

24.1万条亚马逊乐器的评测/评论(13MB)

25.1万条互联网专栏资讯数据集(75.7MB)

26.2万条中文金融新闻数据集(66.6MB)

27.中文图书分类数据集(49.8MB)

28.英文歌词数据集(69.1MB)

29.特朗普政府发表的声明和简报(63.6MB)

爬虫类数据集

1.6000条周杰伦微博超话数据!(1.1MB)
https://www.heywhale.com/mw/dataset/5d3551bdcf76a60036f605aa

2.《中餐厅3》19W弹幕数据(12.8MB)
https://www.heywhale.com/mw/dataset/5d7b69798499bc002c0d3ec5

3.bilibili流行动漫影评数据(2.3MB)
https://www.heywhale.com/mw/dataset/5d3a76dfcf76a600360e19c9

4.淘宝某店铺电风扇评论(273.9KB)
https://www.heywhale.com/mw/dataset/5d442caec143cf002bdb687c

5.7K条马蜂窝国内热门景点游记(140+MB)
https://www.heywhale.com/mw/dataset/5e7c55db98d4a8002d2db5a2

6.IMDB电影评论数据(32.0MB)
https://www.heywhale.com/mw/dataset/5d143d41708b90002c5f7021

7.未名BBS热门话题(3.6MB)
https://www.heywhale.com/mw/dataset/5dad84b375df5c002b20d79f

8.咪蒙所有公众号文章(3.9MB)
https://www.heywhale.com/mw/dataset/5c8723441e7104002b3831d3

9.6000条周杰伦微博超话数据(1.1MB)
https://www.heywhale.com/mw/dataset/5d3551bdcf76a60036f605aa

10.麦当劳就餐负面评论数据集(891.1KB)
https://www.heywhale.com/mw/dataset/5dab2c0b1035d8002c36372b

问答类数据集

1.金融行业问答数据集(245.5MB)

https://www.heywhale.com/mw/dataset/5e9588f8e7ec38002d0331b1

2.社区问答数据集(1.7GB)
https://www.heywhale.com/mw/dataset/5de601f3ca27f8002c4cac47

3.中文医学问答数据集(85MB)
https://www.heywhale.com/mw/dataset/5d313070cf76a60036e4b023

4.CNN 新闻文章中的 12 万个问答对数据集(17.3MB)
https://www.heywhale.com/mw/dataset/5eef1408caa99b002d6e37cc

情感分析类数据集

1.斯坦福情绪树库:带有情感注释的标准情绪数据集(6.1MB)
https://www.heywhale.com/mw/dataset/5daa748c1035d8002c35cdee

2.关于美国的航空公司的推特的情绪分析数据集(2.6MB)
https://www.heywhale.com/mw/dataset/5dab23781035d8002c3634c9

3.中文对话情绪语料(1.1MB)
https://www.heywhale.com/mw/dataset/5d00c390e727f8002c4599ad

4.多域情感数据集(51.2MB)
https://www.heywhale.com/mw/dataset/5de5ce0aca27f8002c4c9ee8

5.sentiment140 情感分析数据集(72.6KB)
https://www.heywhale.com/mw/dataset/5ca46f1a8408c1002b498cca

实体识别类数据集

1.用于命名实体识别的带注释语料库(26.4MB)
https://www.heywhale.com/mw/dataset/5de9be34953ca8002c95c35f

2.使用LatticeLSTM的中文NER数据(191.5KB)
https://www.heywhale.com/mw/dataset/5d564ea0c143cf002b235181

3.医疗命名实体识别数据集(5.1MB)
https://www.heywhale.com/mw/dataset/5dedef59953ca8002c96667a

4.中文实体关系抽取数据集(8.1MB)
https://www.heywhale.com/mw/dataset/5dde487dca27f8002c4a8352

5.金融信息负面及主体判定比赛数据集(17MB)
https://www.heywhale.com/mw/dataset/5e09a9eb2823a10036b126c0

CV类数据集

1.Pronto共享单车数据集(70.8MB)
https://www.heywhale.com/mw/dataset/58a515c48460306efcce2e96

1.Fashion-MNIST图像数据集(200.4MB)
https://www.heywhale.com/mw/dataset/5a0cfcf860680b295c28a753

2.CIFAR100数据集(161.3MB)
https://www.heywhale.com/mw/dataset/5e96da12e7ec38002d03bf51

3.车辆数据集(车辆识别与分类)(62.5MB)
https://www.heywhale.com/mw/dataset/5bc316173631bc00109d2abf

4.垃圾分类数据集
https://www.heywhale.com/mw/dataset/5d133d11708b90002c570588

5.另一个垃圾分类数据集(40.9MB)
https://www.heywhale.com/mw/dataset/5d1578e4708b90002c6a3238

6.CIFAR10数据集(148MB)
https://www.heywhale.com/mw/dataset/5ab3403bfdf6b86c23f259e3

7.GTSRB-德国交通标志识别图像数据(253.3MB)
https://www.heywhale.com/mw/dataset/5d8db5ca037db3002d3a5ba0

8.手势识别数据库(1.1GB)
https://www.heywhale.com/mw/dataset/5d8da999037db3002d3a523

9.情绪的面部表情(170MB+)
https://www.heywhale.com/mw/dataset/5d773bd68499bc002c0c4a6f

10.枪支目标检测(2.4MB)
https://www.heywhale.com/mw/dataset/5d576984c143cf002b238528

11.人脸图像数据(294.1MB)
https://www.heywhale.com/mw/dataset/5da6d4cac83fb4004206edf0

12.RMFD口罩遮挡人脸数据集(610.3MB)
https://www.heywhale.com/mw/dataset/5e81a24b246a590036b884d5

13.中国交警手势数据集(1.8GB)
https://www.heywhale.com/mw/dataset/5de75df5ca27f8002c4cf1bb

14.场景分类数据集(105.9MB)
https://www.heywhale.com/mw/dataset/5ddb4adef41512002cebc776

15.87种宝石图片数据(50.9MB)
https://www.heywhale.com/mw/dataset/5ddccb4aca27f8002c4a26db

16.验证码数据集(13.5MB)
https://www.heywhale.com/mw/dataset/5e143ef32823a10036b34846/file

17.硬币图像数据集(326.7MB)
https://www.heywhale.com/mw/dataset/5e7c7b3e98d4a8002d2dd15c

18.LabelMe图像语义分割数据集(102.6MB)
https://www.heywhale.com/mw/dataset/5e7b770698d4a8002d2d7925

19.车牌识别数据集(62.8MB)
https://www.heywhale.com/mw/dataset/5e96cf8fe7ec38002d03b8ed

20.Biwi头姿势数据库(449.7MB)
https://www.heywhale.com/mw/dataset/5ea16e79105d91002d4f4a25

动物
21.Butterfly-200细粒度图像分类数据集(828MB)
https://www.heywhale.com/mw/dataset/5e85d97095b029002ca7e904

22.宠物图像数据集(783.5MB)
https://www.heywhale.com/mw/dataset/5d76060b8499bc002c0bdd66

23.狗狗种类图像数据集(919.5MB)
https://www.heywhale.com/mw/dataset/5def46de2823a10036aab2f5

24.黑猩猩图片数据集(604.4MB)
https://www.heywhale.com/mw/dataset/5e7b62da98d4a8002d2d6deb

植物:
25.水稻叶子疾病图片集(36.7MB)
https://www.heywhale.com/mw/dataset/5d522069c143cf002b21edbb

26.植物幼苗图片数据集
https://www.heywhale.com/mw/dataset/5d4cd153c143cf002b0a8d4f

27.花卉识别数据集(224.9MB)
https://www.heywhale.com/mw/dataset/5cc127b98c90d7002c8375d7

28.花卉图像分类
https://www.heywhale.com/mw/dataset/5d3320afcf76a60036ec5fd1

29.可食用野外植物数据集
https://www.heywhale.com/mw/dataset/5d4a5e88c143cf002bfbf5d0

30.叶片计数图像数据集(882.3MB)
https://www.heywhale.com/mw/dataset/5e841900246a590036b954bd

气象:
31.飓风损害的卫星图像数据集(63MB)
https://www.heywhale.com/mw/dataset/5da6e8a6c83fb40042073e4f

32.从卫星图像理解云层数据集(42MB)
https://www.heywhale.com/mw/dataset/5dafcd9275df5c002b2189c6

字符识别:
33.TibetanMNIST藏文手写数字数据集(53.2MB)
https://www.heywhale.com/mw/dataset/5bfe734a954d6e0010683839

34.MNIST手写识别数据集(9.5MB)
https://www.heywhale.com/mw/dataset/58a7c84c803d1a0d2e26441a

35.Chars74K字符识别数据集(188.3MB)
https://www.heywhale.com/mw/dataset/5d89c020e3ffb2002c44fb0a

36.信用卡卡面图像及标注数据(42.9MB)
https://www.heywhale.com/mw/dataset/5954cf1372ead054a5e25870

37.手写数学表达式识别(29MB)
https://www.heywhale.com/mw/dataset/5daff66f75df5c002b219fb0

38.图片与单词匹配数据集(31.1MB)
https://www.heywhale.com/mw/dataset/5dab27631035d8002c3635eb

39.密集不规则文本行数据集(353MB)
https://www.heywhale.com/mw/dataset/5da95febc83fb400420f3631

40.视觉文字识别数据集
https://www.heywhale.com/mw/dataset/5df058762823a10036aae665

41.HASY手写符号图片数据集(127.2MB)
https://www.heywhale.com/mw/dataset/5dee0b25953ca8002c9671ea

42.麻将图片数据集(7.5MB)
https://www.heywhale.com/mw/dataset/5de76474ca27f8002c4cf44c

医疗:
43.犬球虫病寄生虫图片集(18.1MB)
https://www.heywhale.com/mw/dataset/5d4cd9c7c143cf002b0abead

44.头部CT图像数据(24.4MB)
https://www.heywhale.com/mw/dataset/5d7213eb8499bc002c0af1e8

45.肺部CT图像数据(529.0MB)
https://www.heywhale.com/mw/dataset/5d71de448499bc002c0ae1fc

46.心血管疾病预测(2.7MB)
https://www.heywhale.com/mw/dataset/5db00b9175df5c002b21af83/file

47.深圳医院胸片检查掩膜图片数据集(19.8MB)
https://www.heywhale.com/mw/dataset/5daff18575df5c002b219c89

48.肺部CT图像数据(529MB)
https://www.heywhale.com/mw/dataset/5d71de448499bc002c0ae1fc

49.结核病图像数据集(456.8MB)
https://www.heywhale.com/mw/dataset/5efc4de063975d002c9792de

行人识别:
50.行人检测数据集ETHZ(146MB)
https://www.heywhale.com/mw/dataset/5db2680f75df5c002b23b755

51.行人重识别数据集Market-1501(145.7MB)
https://www.heywhale.com/mw/dataset/5db148b375df5c002b2295dd

52.行人重识别数据集RAiD(140.1MB)
https://www.heywhale.com/mw/dataset/5db11bf775df5c002b226914

53.行人重识别数据集prid_2011(1015.3MB)
https://www.heywhale.com/mw/dataset/5db10d1a75df5c002b2259dd

54.汽车后视摄像头视角行人数据集(799.7MB)
https://www.heywhale.com/mw/dataset/5dafc75175df5c002b2186a0

07

语音类数据集

1.Mozilla语音数据集-中文(358.2MB)
https://www.heywhale.com/mw/dataset/5d6f91678499bc002c0a722b

2.2000个英语读数字的录音(8.9MB)
https://www.heywhale.com/mw/dataset/5ddde933ca27f8002c4a6013

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值