Goodreads-books(好书籍相关数据集)

169 篇文章 31 订阅

原文:

Goodreads-books

comprehensive list of all books listed in goodreads

The primary reason for creating this dataset is the requirement of a good clean dataset of books. Being a bookie myself (see what I did there?) I had searched for datasets on books in kaggle itself - and I found out that while most of the datasets had a good amount of books listed, there were either a) major columns missing or b) grossly unclean data. I mean, you can't determine how good a book is just from a few text reviews, come on! What I needed were numbers, solid integers and floats that say how many people liked the book or hated it, how much did they like it, and stuff like that. Even the good dataset that I found was well-cleaned, it had a number of interlinked files, which increased the hassle. This prompted me to use the Goodreads API to get a well-cleaned dataset, with the promising features only ( minus the redundant ones ), and the result is the dataset you're at now.

译:

好书

古德雷兹所列全部书籍的综合清单

创建这个数据集的主要原因是需要一个干净的图书数据集。我自己是个赌徒(看到我在那里做了什么了吗?)我在kaggle自己的书中搜索了数据集,我发现,虽然大多数数据集都列出了大量的书,但要么是a)主要列缺失,要么是b)数据极不干净。我的意思是,你不能仅仅从几篇课文评论就决定一本书有多好,拜托!我需要的是数字、实心整数和浮点数,这些数字可以表示有多少人喜欢或讨厌这本书,有多少人喜欢这本书,等等。即使是我发现的好数据集也很干净,它有许多相互关联的文件,这增加了麻烦。这促使我使用GoodReadsAPI来获得一个干净的数据集,只包含有希望的特性(减去多余的特性),结果就是现在的数据集。

大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020122901”获取下载链接。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

不务正业的猿

谢谢您的支持与鼓励!!!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值