Pandas练习1

来自 https://github.com/guipsamora/pandas_exercises

Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Step 1. Import the necessary libraries

import pandas as pd
import numpy as np

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called chipo.

url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url,sep='\t')

Step 4. See the first 10 entries

# Solution 1

chipo[:10]
order_idquantityitem_namechoice_descriptionitem_price
011Chips and Fresh Tomato SalsaNaN$2.39
111Izze[Clementine]$3.39
211Nantucket Nectar[Apple]$3.39
311Chips and Tomatillo-Green Chili SalsaNaN$2.39
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...$16.98
531Chicken Bowl[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...$10.98
631Side of ChipsNaN$1.69
741Steak Burrito[Tomatillo Red Chili Salsa, [Fajita Vegetables...$11.75
841Steak Soft Tacos[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...$9.25
951Steak Burrito[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...$9.25
# Solution 2

chipo.head(10)
order_idquantityitem_namechoice_descriptionitem_price
011Chips and Fresh Tomato SalsaNaN$2.39
111Izze[Clementine]$3.39
211Nantucket Nectar[Apple]$3.39
311Chips and Tomatillo-Green Chili SalsaNaN$2.39
422Chicken Bowl[Tomatillo-Red Chili Salsa (Hot), [Black Beans...$16.98
531Chicken Bowl[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...$10.98
631Side of ChipsNaN$1.69
741Steak Burrito[Tomatillo Red Chili Salsa, [Fajita Vegetables...$11.75
841Steak Soft Tacos[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...$9.25
951Steak Burrito[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...$9.25

Step 5. What is the number of observations in the dataset?

type(chipo)
pandas.core.frame.DataFrame
# Solution 1

len(chipo.index)
4622
# Solution 2

chipo.shape[0]
4622
# Solution 3

chipo.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB

Step 6. What is the number of columns in the dataset?

# Solution 1

len(chipo.columns)
5
# Solution 2

chipo.shape[1]
5

Step 7. Print the name of all the columns.

list(chipo.columns)
['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']

Step 8. How is the dataset indexed?

chipo.index
RangeIndex(start=0, stop=4622, step=1)

Step 9. Which was the most-ordered item?

c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'],ascending=False)
c['quantity'].head(1)
item_name
Chicken Bowl    761
Name: quantity, dtype: int64

Step 10. For the most-ordered item, how many items were ordered?

c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'],ascending=False)
c['quantity'].head(1)
item_name
Chicken Bowl    761
Name: quantity, dtype: int64

Step 11. What was the most ordered item in the choice_description column?

c = chipo.groupby('choice_description')
c = c.sum()
c = c.sort_values(['quantity'],ascending=False)
c.head(1)
order_idquantity
choice_description
[Diet Coke]123455159

Step 12. How many items were orderd in total?

chipo['quantity'].sum()
4972

Step 13. Turn the item price into a float

Step 13.a. Check the item price type
chipo['item_price'].dtypes
dtype('O')
Step 13.b. Create a lambda function and change the type of item price
chipo['item_price'] = chipo['item_price'].apply(lambda x:x.replace('$','')).astype(np.float64);
# dollarizer = lambda x:float(x[1:-1])
# chipo.item_price = chipo.item_price.apply(dollarizer)
Step 13.c. Check the item price type
chipo['item_price'].dtypes
dtype('float64')

Step 14. How much was the revenue for the period in the dataset?

(chipo['quantity']*chipo['item_price']).sum()
39237.02

Step 15. How many orders were made in the period?

# Solution 1

g = chipo.groupby(['order_id'])
g.ngroups
1834
# Solution 2

orders = chipo.order_id.value_counts().count()
orders
1834

Step 16. What is the average revenue amount per order?

# Solution 1

chipo['revenue'] = chipo['quantity']*chipo['item_price']
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']
21.394231188658654
# Solution 2

chipo.groupby(by=['order_id']).sum().mean()['revenue']
21.394231188658654

Step 17. How many different items are sold?

chipo.item_name.value_counts().count()
50

转载于:https://www.cnblogs.com/pkuimyy/p/11505970.html

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值