Goodreads-books（好书籍相关数据集）

最新推荐文章于 2024-06-27 09:46:49 发布

不务正业的猿

最新推荐文章于 2024-06-27 09:46:49 发布

阅读量1.5k

点赞数

分类专栏：下载数据集文章标签： Goodreads-books 好书书籍数据集图书数据集

本文链接：https://blog.csdn.net/ispeasant/article/details/111934099

版权

下载同时被 2 个专栏收录

198 篇文章 40 订阅

订阅专栏

数据集

169 篇文章 31 订阅

订阅专栏

原文：

Goodreads-books

comprehensive list of all books listed in goodreads

The primary reason for creating this dataset is the requirement of a good clean dataset of books. Being a bookie myself (see what I did there?) I had searched for datasets on books in kaggle itself - and I found out that while most of the datasets had a good amount of books listed, there were either a) major columns missing or b) grossly unclean data. I mean, you can't determine how good a book is just from a few text reviews, come on! What I needed were numbers, solid integers and floats that say how many people liked the book or hated it, how much did they like it, and stuff like that. Even the good dataset that I found was well-cleaned, it had a number of interlinked files, which increased the hassle. This prompted me to use the Goodreads API to get a well-cleaned dataset, with the promising features only ( minus the redundant ones ), and the result is the dataset you're at now.

译：

好书

古德雷兹所列全部书籍的综合清单

创建这个数据集的主要原因是需要一个干净的图书数据集。我自己是个赌徒（看到我在那里做了什么了吗？）我在kaggle自己的书中搜索了数据集，我发现，虽然大多数数据集都列出了大量的书，但要么是a）主要列缺失，要么是b）数据极不干净。我的意思是，你不能仅仅从几篇课文评论就决定一本书有多好，拜托！我需要的是数字、实心整数和浮点数，这些数字可以表示有多少人喜欢或讨厌这本书，有多少人喜欢这本书，等等。即使是我发现的好数据集也很干净，它有许多相互关联的文件，这增加了麻烦。这促使我使用GoodReadsAPI来获得一个干净的数据集，只包含有希望的特性（减去多余的特性），结果就是现在的数据集。

大家可以到官网地址下载数据集，我自己也在百度网盘分享了一份。可关注本人公众号，回复“2020122901”获取下载链接。