kaggle练习题、pandas练习题(Exercise: Summary Functions and Maps)

Exercise: Summary Functions and Maps

在这里插入图片描述

1.What is the median of the points column in the reviews DataFrame?

median_points = reviews.points.median()

2.What countries are represented in the dataset? (Your answer should not include any duplicates.)

countries = reviews.country.unique()

3.How often does each country appear in the dataset? Create a Series reviews_per_country mapping countries to the count of reviews of wines from that country.

reviews_per_country = reviews['country'].value_counts()

4.Create variable centered_price containing a version of the price column with the mean price subtracted.(Note: this ‘centering’ transformation is a common preprocessing step before applying various machine learning algorithms.)

a = reviews.price.mean()
centered_price = reviews.price.map(lambda p: p - a)

5.I’m an economical wine buyer. Which wine is the “best bargain”? Create a variable bargain_wine with the title of the wine with the highest points-to-price ratio in the dataset.(这里用了两种方法)

# 方法1
reviews['ratio'] = [reviews['points'][i] / reviews['price'][i] if reviews['price'][i] != 'null' else 0  for i in range(len(reviews))]
data = reviews.sort_values(by = ['ratio'],ascending = False,ignore_index = True)
print(data['ratio'].head(5))
print(data['ratio'].tail(5))
bargain_wine = data.loc[0,'title']    

# 方法2
bargain_idx = (reviews.points/reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx,'title']

6.There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be “tropical” or “fruity”? Create a Series descriptor_counts counting how many times each of these two words appears in the description column in the dataset. (For simplicity, let’s ignore the capitalized versions of these words.)(这里用了两种方法)

# 方法1
a,b = 0,0
for i in range(len(reviews)):
     # 如果 reviews['description'][i] 包含 tropical  
    if 'tropical' in reviews['description'][i]:  
        a += 1  
    # 如果 reviews['description'][i] 包含 fruity  
    if 'fruity' in reviews['description'][i]:  
        b += 1  
descriptor_counts = pd.Series([a,b],index = ['tropical','fruity'])
print(descriptor_counts)  

# 方法2
n_tro = reviews.description.map(lambda desc:"tropical" in desc).sum()
n_fru = reviews.description.map(lambda desc:"fruity" in desc).sum()
descriptor_counts = pd.Series([n_tro,n_fru],index = ["tropical","fruity"])

7. We’d like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we’d like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.Create a series star_ratings with the number of stars corresponding to each review in the dataset.

def star(row):
    if row.country == 'Canada':
        return 3
    elif row.points>=95:
        return 3
    elif row.points>=85:
        return 2
    else:
        return 1

star_ratings = reviews.apply(star,axis = 'columns')
print(star_ratings)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值