2020 MCM Problem C A Wealth of Data

In the online marketplace it created, Amazon provides customers with anopportunity to rate and review purchases.(bk1) Individualratings - called “star ratings” – allow purchasers to express theirlevel of satisfaction with a product using a scale of 1 (low rated, lowsatisfaction) to 5 (highly rated, high satisfaction).(bk1’) Additionally,customers can submit text-based messages – called “reviews” – thatexpress further opinions and information about the product.(bk1’’) Other customerscan submit ratings on these reviews as being helpful or not – called a “helpfulnessrating” – towards assisting their own product purchasing decision.(bk1’’’) Companies usethese data to gain insights into the markets in which theyparticipate, the timing of that participation, and the potential success of product designfeature choices.(bk2)

Sunshine Company is planning to introduce and sell three new products inthe online marketplace: a microwave oven, a baby pacifier, and ahair dryer.(pr1) They have hired your team as consultants to identify key patterns, relationships,measures, and parameters in past customer- supplied ratings andreviews associated with other competing products to 1) inform their onlinesales strategy and 2) identify potentially important design features that wouldenhance product desirability.(pr1’) Sunshine Companyhas used data to inform sales strategies in the past, but they have notpreviously used this particular combination and type of data. Of particularinterest to Sunshine Company are time-based patterns in these data,and whether they interact in ways that will help the company craft successfulproducts.(imp1)

To assist you, Sunshine’s data center has provided you with three datafiles for this project: hair_dryer.tsvmicrowave.tsv,and pacifier.tsv. These data represent customer-supplied ratingsand reviews for microwave ovens, baby pacifiers, and hair dryers sold in theAmazon marketplace over the time period(s) indicated in the data. A glossary ofdata label definitions is provided as well. THE DATA FILES PROVIDED CONTAIN THEONLY DATA YOU SHOULD USE FOR THIS PROBLEM.(rsc1s)


1.      Analyze the threeproduct data sets provided to identify, describe, and support with mathematicalevidence, meaningful quantitative and/or qualitative patterns, relationships,measures, and parameters within and between star ratings,reviews, and helpfulness ratings that will help Sunshine Company succeed intheir three new online marketplace product offerings.

2.      Use your analysisto address the following specific questions and requests from the SunshineCompany Marketing Director:

·        Identify data measures based on ratingsand reviews that are most informative for SunshineCompany to track, once their three products are placed on sale in the onlinemarketplace.

·        Identify and discuss time-based measures and patternswithin each data set that might suggest that a product’s reputation is increasingor decreasing in the online marketplace.

·        Determine combinations of text-basedmeasure(s) and ratings-based measures that best indicate a potentiallysuccessful or failing product.

·        Do specific star ratings incite more reviews? Forexample, are customers more likely to write some type of review after seeing aseries of low star ratings?

·        Are specific quality descriptors oftext-based reviews such as ‘enthusiastic’, ‘disappointed’, and others, stronglyassociated with rating levels?

3.      Write a one- totwo-page letter to the Marketing Director of Sunshine Company summarizing yourteam’s analysis and results. Include specific justification(s) for the resultthat your team most confidently recommends to the Marketing Director.(mss1)

Your submission should consist of:

·        One-page Summary Sheet

·        Table of Contents

·        One- to Two-page Letter

·        Your solution of no more than 20 pages,for a maximum of 24 pages with your summary sheet, table of contents, andtwo-page letter.

Note: Reference List and any appendices do not count toward the page limitand should appear after your completed solution. You should not make use ofunauthorized images and materials whose use is restricted by copyright laws.Ensure you cite the sources for your ideas and the materials used in yourreport.


·        Helpfulness Rating: an indication ofhow valuable a particular product review is when making a decision whether ornot to purchase that product.

·        Pacifier: a rubber orplastic soothing device, often nipple shaped, given to a baby to suck or biteon.

·        Review: a writtenevaluation of a product.

·        Star Rating:a score given ina system that allows people to rate a product with a number of stars.

The Problem Datasets


The three data sets provided contain product user ratings and reviews extractedfrom the Amazon Customer Reviews Dataset thru Amazon Simple Storage Service(Amazon S3).


Data Set Definitions: Each row represents data partitionedinto the following columns.

·        marketplace (string): 2 letter countrycode of the marketplace where the review was written.

·        customer_id (string): Random identifierthat can be used to aggregate reviews written by a single author.

·        review_id (string): The unique ID of thereview.

·        product_id (string): The unique Product IDthe review pertains to.

·        product_parent (string): Random identifierthat can be used to aggregate reviews for the same product.

·        product_title (string): Title of theproduct.

·        product_category (string): The majorconsumer category for the product.

·        star_rating (int): The 1-5 star rating ofthe review.

·        helpful_votes (int): Number of helpfulvotes.

·        total_votes (int): Number of total votesthe review received.

·        vine (string): Customers are invited tobecome Amazon Vine Voices based on the trust that they have earned in theAmazon community for writing accurate and insightful reviews. Amazon providesAmazon Vine members with free copies of products that have been submitted tothe program by vendors. Amazon doesn't influence the opinions of Amazon Vinemembers, nor do they modify or edit reviews.

·        verified_purchase (string): A “Y”indicates Amazon verified that the person writing the review purchased theproduct at Amazon and didn't receive the product at a deep discount.

·        review_headline (string): The title of thereview.

·        review_body (string): The review text.

·        review_date (bigint): The date the reviewwas written.













  1. 评价打分中对销售最具信息量的指标,可以是互信息或者,单个变量的拟合效果,相关系数;

  2. 从时间维度,构建样本,来分析评价打分对产品声誉的影响,其实还是关联分析,并没有要求从时间上去预测什么;

  3. 用文本和评价值两个因子估计成功与否,会神经网络的同学赶紧大发神威吧!

  4. 打分和评论数量的关系,和1的问题没什么两样;

  5. 评论中关键词和打分的关系,应该是关键词问题加上关联分析了。










