kaggle练习题、pandas练习题(Exercise: Summary Functions and Maps)

最新推荐文章于 2023-09-04 16:46:23 发布

xxY0_0Yxx

最新推荐文章于 2023-09-04 16:46:23 发布

阅读量223

点赞数 1

分类专栏：数据分析 python基础文章标签： pandas

本文链接：https://blog.csdn.net/weixin_46395175/article/details/132666438

版权

python基础同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

数据分析

3 篇文章 0 订阅

订阅专栏

Exercise: Summary Functions and Maps

在这里插入图片描述

1.What is the median of the points column in the reviews DataFrame?

median_points = reviews.points.median()

2.What countries are represented in the dataset? (Your answer should not include any duplicates.)

countries = reviews.country.unique()

3.How often does each country appear in the dataset? Create a Series reviews_per_country mapping countries to the count of reviews of wines from that country.

reviews_per_country = reviews['country'].value_counts()

4.Create variable centered_price containing a version of the price column with the mean price subtracted.(Note: this ‘centering’ transformation is a common preprocessing step before applying various machine learning algorithms.)

a = reviews.price.mean()
centered_price = reviews.price.map(lambda p: p - a)

5.I’m an economical wine buyer. Which wine is the “best bargain”? Create a variable bargain_wine with the title of the wine with the highest points-to-price ratio in the dataset.（这里用了两种方法）

# 方法1
reviews['ratio'] = [reviews['points'][i] / reviews['price'][i] if reviews['price'][i] != 'null' else 0  for i in range(len(reviews))]
data = reviews.sort_values(by = ['ratio'],ascending = False,ignore_index = True)
print(data['ratio'].head(5))
print(data['ratio'].tail(5))
bargain_wine = data.loc[0,'title']    

# 方法2
bargain_idx = (reviews.points/reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx,'title']

6.There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be “tropical” or “fruity”? Create a Series descriptor_counts counting how many times each of these two words appears in the description column in the dataset. (For simplicity, let’s ignore the capitalized versions of these words.)（这里用了两种方法）

# 方法1
a,b = 0,0
for i in range(len(reviews)):
     # 如果 reviews['description'][i] 包含 tropical  
    if 'tropical' in reviews['description'][i]:  
        a += 1  
    # 如果 reviews['description'][i] 包含 fruity  
    if 'fruity' in reviews['description'][i]:  
        b += 1  
descriptor_counts = pd.Series([a,b],index = ['tropical','fruity'])
print(descriptor_counts)  

# 方法2
n_tro = reviews.description.map(lambda desc:"tropical" in desc).sum()
n_fru = reviews.description.map(lambda desc:"fruity" in desc).sum()
descriptor_counts = pd.Series([n_tro,n_fru],index = ["tropical","fruity"])

7. We’d like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we’d like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.Create a series star_ratings with the number of stars corresponding to each review in the dataset.

def star(row):
    if row.country == 'Canada':
        return 3
    elif row.points>=95:
        return 3
    elif row.points>=85:
        return 2
    else:
        return 1

star_ratings = reviews.apply(star,axis = 'columns')
print(star_ratings)

xxY0_0Yxx

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
kaggle练习题、pandas练习题(Exercise: Summary Functions and Maps)

kaggle中pandas练习题的第三部分，Exercise: Summary Functions and Maps，pandas练习与应用题，map()、apply()方法应用，kaggle、pandas
复制链接

扫一扫