PyPackage01---Pandas00_DataFrame

pandas DataFrame基础操作,不定时update

创建数据框

构造两个数据集

  • df1:用户基础属性,年龄、性别、注册时间
  • df2:用户交易属性,交易时间、支付金额

从dict创建

import pandas as pd
userid = list(range(1, 6))
sex = ["male", "female","male", "female","male"]
createtime_str = [
    "2019-01-01 11:45:50", "2019-01-02 11:55:50", "2019-01-21 11:45:50", 
    "2019-02-01 12:45:50", "2019-01-15 10:40:50"
]
buytime_str = ["2019-04-01 11:45:50", "2019-05-02 11:56:50", "2019-07-21 12:45:50", 
    "2019-08-01 12:40:50", "2019-01-06 10:00:50"]
age = [18, 37, 21, 44, 39]
payamount =[11.15, 10.37, 12.11, 14.5, 16.39]
df1 = pd.DataFrame({'userid':userid,'sex':sex,'age':age,'createtime_str':createtime_str})
df2 = pd.DataFrame({'userid':userid,'buytime_str':buytime_str,'payamount':payamount})
df1
useridsexagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50
df2
useridbuytime_strpayamount
012019-04-01 11:45:5011.15
122019-05-02 11:56:5010.37
232019-07-21 12:45:5012.11
342019-08-01 12:40:5014.50
452019-01-06 10:00:5016.39
df1
useridsexagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

数据框概览

判断类型

type(df1)
pandas.core.frame.DataFrame

打印列类型

df1.dtypes
userid             int64
sex               object
age                int64
createtime_str    object
dtype: object

描述性统计

只输出了数值型列的相关信息

df1.describe()
useridage
count5.0000005.000000
mean3.00000031.800000
std1.58113911.562872
min1.00000018.000000
25%2.00000021.000000
50%3.00000037.000000
75%4.00000039.000000
max5.00000044.000000

行数列数统计

df1.shape
print("行数:"+str(df1.shape[0])+"\n"+"列数:"+str(df1.shape[1]))
行数:5
列数:4

取子集

取某个位置

行列索引从0开始

df1.iloc[1,2]
37

筛选前几行

df1.iloc[1:3,:]
useridsexagecreatetime_str
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50

筛选某些列

df1.iloc[:,[1,3]]
sexcreatetime_str
0male2019-01-01 11:45:50
1female2019-01-02 11:55:50
2male2019-01-21 11:45:50
3female2019-02-01 12:45:50
4male2019-01-15 10:40:50
df1[["userid","age"]]
useridage
0118
1237
2321
3444
4539

条件筛选

df1.query("userid>=2")
useridsexagecreatetime_str
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

列操作

列数&列名

df1.columns.size
4
list(df1.columns)
['userid', 'sex', 'age', 'createtime_str']

列名修改

修改一个列

# inplace=False标识不覆盖原来的数据框df1
df1.rename(columns={"userid":"id"},inplace=False)
idsexagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

修改多个列

# inplace=False标识不覆盖原来的数据框df1
df1.rename(columns={"userid":"id","sex":"gender"},inplace=False)
idgenderagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

修改全部列

注意:深拷贝和浅拷贝
https://www.runoob.com/w3cnote/python-understanding-dict-copy-shallow-or-deep.html

import copy
temp = copy.deepcopy(df1)
temp.columns=["id1","sex1","age1","createtime_str"]
temp
id1sex1age1createtime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

删除列

删除一个列

temp = copy.deepcopy(df1)
temp.drop(axis=1,columns=["sex"],inplace=False)
useridagecreatetime_str
01182019-01-01 11:45:50
12372019-01-02 11:55:50
23212019-01-21 11:45:50
34442019-02-01 12:45:50
45392019-01-15 10:40:50

删除多个列

temp = copy.deepcopy(df1)
temp.drop(axis=1,columns=["sex","age"],inplace=False)
useridcreatetime_str
012019-01-01 11:45:50
122019-01-02 11:55:50
232019-01-21 11:45:50
342019-02-01 12:45:50
452019-01-15 10:40:50

列筛选

df1[["userid","age"]]
useridage
0118
1237
2321
3444
4539
df1.iloc[1:3,0:2]
useridsex
12female
23male

增加列

temp = copy.deepcopy(df1)
temp["new"] = temp["age"]*2
temp
useridsexagecreatetime_strnew
01male182019-01-01 11:45:5036
12female372019-01-02 11:55:5074
23male212019-01-21 11:45:5042
34female442019-02-01 12:45:5088
45male392019-01-15 10:40:5078

增加常数列

temp = copy.deepcopy(df1)
temp["new"] ="new"
temp
useridsexagecreatetime_strnew
01male182019-01-01 11:45:50new
12female372019-01-02 11:55:50new
23male212019-01-21 11:45:50new
34female442019-02-01 12:45:50new
45male392019-01-15 10:40:50new

通过运算增加列

temp = copy.deepcopy(df1)
temp["new"] = temp.age.apply(lambda x:"20多啦" if x>20 else "还是小伙子哦")
temp
useridsexagecreatetime_strnew
01male182019-01-01 11:45:50还是小伙子哦
12female372019-01-02 11:55:5020多啦
23male212019-01-21 11:45:5020多啦
34female442019-02-01 12:45:5020多啦
45male392019-01-15 10:40:5020多啦

通过向量or列表增加列

temp = copy.deepcopy(df1)
temp["new"] = list(range(5))
temp
useridsexagecreatetime_strnew
01male182019-01-01 11:45:500
12female372019-01-02 11:55:501
23male212019-01-21 11:45:502
34female442019-02-01 12:45:503
45male392019-01-15 10:40:504

列的类型转化

几个列类型转化

df1.dtypes
userid             int64
sex               object
age                int64
createtime_str    object
dtype: object
temp = copy.deepcopy(df1)
temp["age"] = df1.age.astype("double")
temp.dtypes
userid              int64
sex                object
age               float64
createtime_str     object
dtype: object

多个列批量转化

批量修改目前支持格式不多

  • to_numeric
  • to_datetime

要么就写循环吧

temp = copy.deepcopy(df1)
temp[["createtime_str"]] = temp[["createtime_str"]].apply(pd.to_datetime,format='%Y-%m-%d %H:%M:%S.%f')
temp.dtypes
userid                     int64
sex                       object
age                        int64
createtime_str    datetime64[ns]
dtype: object
temp
useridsexagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

列之间的运算

df1.dtypes
userid             int64
sex               object
age                int64
createtime_str    object
dtype: object
import copy
temp = copy.deepcopy(df1)
temp["userid1"] = [5,4,3,2,1]
temp["greatest"] = temp[["userid","userid1"]].apply(lambda x:max(x),axis=1)
temp["leastest"] = temp[["userid","userid1"]].apply(lambda x:min(x),axis=1)
temp
useridsexagecreatetime_struserid1greatestleastest
01male182019-01-01 11:45:50551
12female372019-01-02 11:55:50442
23male212019-01-21 11:45:50333
34female442019-02-01 12:45:50242
45male392019-01-15 10:40:50151

行操作

统计行数

df1.shape[0]
5

行筛选

条件筛选即可

df1.query("userid>=2")
useridsexagecreatetime_str
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

重复行删除

import copy
temp = copy.deepcopy(df1)
temp = pd.concat([df1,pd.DataFrame({'userid':[1],'sex':["male"],'age':[18],'createtime_str':["2019-01-01 11:45:50"]})])
temp
useridsexagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50
01male182019-01-01 11:45:50
temp.drop_duplicates()
useridsexagecreatetime_str
01male182019-01-01 11:45:50
12female372019-01-02 11:55:50
23male212019-01-21 11:45:50
34female442019-02-01 12:45:50
45male392019-01-15 10:40:50

                                    2020-02-29 Update于南京市栖霞区

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值