eachmovie 数据集说明

From: ( http://www.research.digital.com/SRC/eachmovie/ )
[EachMovie]
EachMovie collaborative filtering data set

Contents

Introduction
Terms of usage
Schema
Obtaining the data set
Introduction

The DEC Systems Research Center ran the EachMovie recommendation service for 18 months to experiment with a collaborative filtering algorithm. During that time, some 72916 users entered a total of 2811983 numeric ratings for 1628 different movies (films and videos). We are making this preference data set available, with all user identification removed, so that other collaborative filtering researchers can use it to test their algorithms.

If you are interested in the design of our system, you can read the Each to Each Programmer's Reference Manual written by Paul McJones and John DeTreville.

Terms of usage

Copyright © Digital Equipment Corporation 1997.

The preference data set was compiled by Digital Equipment Corporation using our collaborative filtering technology. Digital is making the data set available for use under the terms that apply to this Digital web site (see Legal) including the following terms:

1. All information is provided "AS IS". Digital makes no warranties or representations with respect to the completeness or accuracy of the information or otherwise. DIGITAL DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. In no event shall Digital be liable for damages, and in particular Digital shall not be liable for special, indirect, consequential, or incidental damages, or damages for lost profits, loss of revenue, or loss of use, arising out of or related to the information or the use or dissemination thereof, whether such damages arise in contract, negligence, tort, under statute, in equity, at law or otherwise.

3. The user may use the information only for research purposes which are non-commercial and non-revenue bearing. Any published research results or other publications resulting from use of the information shall credit Digital Equipment Corporation as the provider of the data. The user agrees to provide Digital with a copy of any such publication using any of the contact names provided at this web site. The user may make copies of the data set as needed for internal use only for the preceding purposes. All such copies shall duplicate Digital's copyright notice and this notice.

Schema

The data set is available as eachmoviedata.tar.gz (zipped tab-separated-value text files, 17632000 bytes compressed). There are three tables, one per file:

Person (person.txt) provides optional, unaudited demographic data supplied by each person:
ID: Number -- primary key
Age: Number
Gender: Text -- one of "M", "F"
Zip_Code: Text
Movie (movie.txt) provides descriptive information about each movie:
ID: Number -- primary key
Name: Text
PR_URL: Text -- URL of studio PR site
IMDb_URL: Text -- URL of Internet Movie Database entry
Theater_Status: Text -- either "old" or "current"
Theater_Release: Date/Time
Video_Status: Text -- either "old" or "current"
Video_Release: Date/Time
Action, Animation, Art_Foreign, Classic, Comedy, Drama, Family, Horror, Romance, Thriller: Yes/No
IMDb URLs are provided by courtesy of Internet Movie Database.

The theater and video status and release dates were (approximately) correct in the San Francisco bay area as of September 15, 1997, when EachMovie was terminated.

Vote (vote.txt) is the actual rating data:
Person_ID: Number
Movie_ID: Number
Score: Number -- 0 <= Score <= 1
Weight: Number -- 0 < Weight <= 1
Modified: Date/Time
Score is the rating provided by this person for this movie. The zero-to-five star rating used externally on EachMovie is mapped linearly to the interval [0,1]. Here's a histogram of the Score values:

     Score Count
     0 347191
     0.2 150495
     0.4 339718
     0.6 701236
     0.8 761676
     1.0 511667
     
Weight is only relevant in the case of a Score of zero, in which case it distinguishes whether the person rated a movie as zero stars (weight = 1) or "sounds awful" (weight < 1). (Most "sounds awful" weights are 0.2, but for historical reasons about 10% are 0.5.) The idea behind "sounds awful" was to let a user indicate he never planned to see a movie (hence we would omit it from future list of predictions). Our collaborative filtering algorithm treated such a declaration as less authoratative than a regular rating of zero stars.

Given our site design, there is no way to know whether the person had seen the movie in a theater or on video.

Obtaining the data set

If you have read the terms above, and agree to them, contact

Steve Glassman
<steveg@pa.dec.com>
1 650 853-2166
Compaq Systems Research Center
130 Lytton Avenue
Palo Alto, CA 94301
by telephone or email. He will give you a password for downloading the data. You may also send copies of your publications involving this data (see term 3 above) to Steve.

Legal

Digital

Developed by Digital Equipment Corporation.
Copyright © Digital Equipment Corporation, 1997.
The DIGITAL logo is a trademark of Digital Equipment Corporation.

All other trademarks are the property of their respective owners. kumpf last updated Jul 30, 1999


转自:http://www.douban.com/note/502794377/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: eachmovie数据集是一个包含电影信息的数据集。它包含了大量电影的相关属性,如电影的标题、上映时间、类型、评分以及演员等。这个数据集的目的是帮助人们对电影进行分类和推荐。 在eachmovie数据集中,每部电影都有一个唯一的ID,以便于数据的索引和管理。每个电影的其他属性都有相应的字段来描述,例如标题字段用于存储电影的名称,上映时间字段用于存储电影的发布日期,类型字段用于记录电影的种类,评分字段用于记录电影的评分等等。 通过使用eachmovie数据集,我们可以进行各种类型的分析和推荐。例如,我们可以根据电影的类型字段,对电影进行分类,以便于用户可以根据自己的喜好来查找和观看感兴趣的电影。我们还可以根据电影的评分字段,对电影进行排名和推荐,让用户可以找到高质量的电影。 除了电影的属性信息,eachmovie数据集还包含了电影之间的关系数据。例如,每个电影都可以有一个或多个导演,有一个或多个演员等等。这些关系数据可以帮助我们建立更全面和准确的电影推荐系统。 总的来说,eachmovie数据集是一个丰富且实用的电影信息集合,可以用于电影分类、排名和推荐等应用领域。通过对这个数据集的分析,我们可以更好地了解电影行业,提供更好的电影推荐服务,满足用户的观影需求。 ### 回答2: eachmovie数据集是一个用于电影推荐系统研究的公开数据集。该数据集包含了大量关于电影的信息,包括电影的标题、类型、导演、演员、发行日期、评分等。这些信息可以帮助研究人员探索电影推荐系统的构建和优化方法。 在eachmovie数据集中,每一部电影都被赋予一个唯一的ID号,以便进行识别和索引。这个ID号可以作为对电影的唯一标识,便于进行电影之间的关联和分析。 另外,eachmovie数据集还包含了用户对电影的评分信息。用户可以对电影进行打分,从1到5不等。这些用户评分数据可以被用来建立用户对电影的偏好模型,从而进行个性化的电影推荐。 通过对eachmovie数据集的分析,研究人员可以探索各种电影推荐算法的性能和效果。他们可以使用数据集中的电影信息和用户评分信息,构建推荐模型,并通过评估模型的准确性和效果,来提高推荐系统的性能。 总之,eachmovie数据集是一个丰富的电影信息和用户评分数据集,它可以帮助研究人员研究和改进电影推荐系统,从而提供更加个性化和准确的电影推荐。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值