Pandas的使用

最新推荐文章于 2023-03-19 10:49:59 发布

weixin_45589945

最新推荐文章于 2023-03-19 10:49:59 发布

阅读量102

点赞数

文章标签：机器学习自然语言处理

本文链接：https://blog.csdn.net/weixin_45589945/article/details/109833850

版权

字符串，时间序列
data analysis tool
series一维，带标签的数组

Dataframe

Series的创建

import pandas as pd
pd.Series([ ])

import pandas as pd
t=pd.Series([1,2,31,12,3,4],index=list("abcdef"))
print(t)
print(type(t))
#字典方法
dic={"name":"zxh","age":"20","tel":"188"}
t2=pd.Series(dic)
print("hh:",t2.values)
print("kk",t2.index)
print(t2)
print(type(t2.index))
print(list(t2.index))
len(t2.index)

a     1
b     2
c    31
d    12
e     3
f     4
dtype: int64
<class 'pandas.core.series.Series'>
hh: ['zxh' '20' '188']
kk Index(['name', 'age', 'tel'], dtype='object')
name    zxh
age      20
tel     188
dtype: object
<class 'pandas.core.indexes.base.Index'>
['name', 'age', 'tel']

Process finished with exit code 0

读取外部数据

读取csv中的文件

df=pd.read_csv("./ ")
print(df)
pd.read_sql( )

Dataframe

创建

这是个series容器，有行索引和列索引

m=pd.DataFrame(np.arange(12).reshape((3,4)),index=list("abs"),columns=list("XYZH"))
print(m)

运行结果

 X  Y   Z   H
a  0  1   2   3
b  4  5   6   7
s  8  9  10  11

也可以用字典方法创建

dic2={"name":["zxh","zxd"],"age":["20","15"],"tel":["188","199"]}
r=pd.DataFrame(dic2)
print(r)

 name age  tel
0  zxh  20  188
1  zxd  15  199

描述信息

df.index
df.colomns
df.head(1)
df.tail(2)
df.info() —概览信息
df.describe()—数据分布统计

dataframe中排序

df=df.sort_values(by=“ ”，ascending=False)
print(df[:20]) 方括号写数字，取行
print(df[“label”]) 写字符串，对列操作
另一种方法：loc
t3.loc[[“a”,“c”],“A”:“C”]
t3.loc[[0,2],[2,4]] 用数字索引

bool索引和缺失数据处理

print(df[“info”].str.split("/").tolist())
NaN
pd.isnull(t3)
pd.notnull(t3)
pd.notnull(t3[“W”])
t3.dropna(axis=0,how=“any”,inplace=True)
inplace:原地修改
t2.fillna(t2.mean()) 填充缺失数据