1.练习1 world food facts
2.练习2 chipotle
image.png
数据是这样的
2.1 题目和答案
2.1.1 What is the number of observations in the dataset?
chipo.shape
chipo.info()
注意查看表格的shape 不需要参数
2.1.2 What is the number of columns in the dataset?
chipo.shape[1]
2.1.3 Print the name of all the columns.
chipo.columns
也是不需要参数的
2.1.4 How is the dataset indexed?
chipo.index
2.1.5 Which was the most-ordered item?
c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)
需要复习group by与sum连用; sort_values是排序用的
2.1.6 For the most-ordered item, how many items were ordered?
c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)
按照item_name分组,key就是item_name,再根据分组之后的结果按照quantity排序
2.1.7 What was the most ordered item in the choice_description column?
c = chipo.groupby('choice_description').sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)
按照choice_description进行分组,然后根据分组结果按照quantity进行降序排序
2.1.8 How many items were orderd in total?
d=chipo.quantity.sum()
d
求多少东西被买了,求quantity的总和
2.1.9 Turn the item price into a float
chipo.item_price.dtype
dollarizer = lambda x: float(x[1:-1])
chipo.item_price = chipo.item_price.apply(dollarizer)
chipo.item_price.dtype
先查看series的数据类型,然后用lambda创造一个function,apply一下
.astype()也可以进行数据类型转换,就是容易报错
2.1.10 How much was the revenue for the period in the dataset?
revenue = chipo.quantity *chipo.item_price
revenue = revenue.sum()
revenue
2.1.11 How many orders were made in the period?
orders = chipo.order_id.value_counts().count()
orders
value_counts()可以统计不同值,distinct.
count()统计非空值
2.1.12 What is the average revenue amount per order?
chipo['revenue'] = chipo['quantity'] * chipo['item_price']
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']
chipo.groupby(by=['order_id']).sum().mean()['revenue']
2.1.13 How many different items are sold?
chipo.item_name.value_counts().count()