![2b7d5cf14e7cc4f3d0c48ff6a0a8415b.png](https://img-blog.csdnimg.cn/img_convert/2b7d5cf14e7cc4f3d0c48ff6a0a8415b.png)
看有些人在科研上走弯路,浪费青春实在可惜,他们先确定研究方向,然后大量阅读参考文献,学习方法论和工具,万事俱备只欠东风(数据),在动手前才去搞数据,结果由于种种问题获取不得,接着可能放弃或者改方向...
今天,奉献本年度全球最新的开源数据集,先搞到数据,再做其他事!废话少说,直接上干货:
Yelp(相当于美国版的大众点评)
Yelp Datasetwww.yelp.com![de9ffe00c7a534b71062103ce943d9fb.png](https://img-blog.csdnimg.cn/img_convert/de9ffe00c7a534b71062103ce943d9fb.png)
The Yelp dataset is a subset of our businesses, reviews, and user data for use in personal, educational, and academic purposes. Available as JSON files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.
SQuAD2.0(相当于斯坦福大学版的知乎)
The Stanford Question Answering Datasetrajpurkar.github.io![4c3a6095568cfba9686263dc98d62e7b.png](https://img-blog.csdnimg.cn/img_convert/4c3a6095568cfba9686263dc98d62e7b.png)
另外,该数据集还有排行榜功能,截止2020-4-5,Top 1采用的是如下组合:ALBERT + DAAF + Verifier (ensemble),EM:90.386,F1:92.777
IMDb(相当于美国版的猫眼电影)
- IMDbwww.imdb.com![3793c23191692219380e4e70ff8593ac.png](https://img-blog.csdnimg.cn/img_convert/3793c23191692219380e4e70ff8593ac.png)
Movielens(相当于美国版的豆瓣)
MovieLensgrouplens.org![e239d88296b986cc43d73fb32e6e6cb7.png](https://img-blog.csdnimg.cn/img_convert/e239d88296b986cc43d73fb32e6e6cb7.png)
![d1ba7c79d76b2b7ac8fdbed95a727c2d.png](https://img-blog.csdnimg.cn/img_convert/d1ba7c79d76b2b7ac8fdbed95a727c2d.png)
以上