原题链接:https://www.kaggle.com/code/residentmario/summary-functions-and-maps
聚合函数以及udf的使用。
1
median_points = reviews.points.median()
2
countries = reviews.country.unique()
3
reviews_per_country = reviews.country.value_counts()
4
两句可合并为一句
avg_price = reviews.price.mean()
centered_price = reviews.price - avg_price
5
写完看了下正解,发现写多了。
def get_points_to_price(row):
row.points = row.points / row.price
return row
bargain_wine_idx = reviews.apply(get_points_to_price, axis='columns').points.idxmax()
bargain_wine = reviews.iloc[bargain_wine_idx]['title']
6
看了提示,想不到一次将两种搜索出来的方式,还是分别搜索出来再合并。
description: pd.Series = reviews.description
search_keys = ['tropical', 'fruity']
descriptor_counts: pd.Series = pd.Series({
key: description.str.contains(key).sum()
for key in search_keys
})
7
def get_star(row):
return 3 if row.points >= 95 or row.country == 'Canada' else \
2 if row.points >= 85 else \
1
star_ratings = reviews.apply(get_star, axis='columns')