pandas 读取局域网文件_1-python数据分析-Pandas基础操作

最新推荐文章于 2024-03-29 09:27:17 发布

weixin_39842955

最新推荐文章于 2024-03-29 09:27:17 发布

阅读量227

点赞数

文章标签： pandas 读取局域网文件

本文链接：https://blog.csdn.net/weixin_39842955/article/details/111976482

版权

1-python数据分析-Pandas基础操作

为什么学习pandas

numpy已经可以帮助我们进行数据的处理了，那么学习pandas的目的是什么呢？

numpy能够帮助我们处理的是数值型的数据，当然在数据分析中除了数值型的数据还有好多其他类型的数据(字符串，时间序列)，那么pandas就可以帮我们很好的处理非数值型数据！

什么是pandas？

对非数值型的数据进行存储和运算操作

首先先来认识pandas中的两个常用的类

Series 一维数组

DataFrame 是由Series组成，多组Series组成一个DataFrame, 二维表格

Series

Series是一种类似一维数组的容器对象，由下面两个部分组成：

values：一组数据(ndarray类型)

index：相关的数据索引标签

Series的创建

由列表或numpy数组创建

由字典创建

#将列表作为数据源

Series(data=[1,2,3,4,5])

1 2

2 3

3 4

4 5dtype: int64#将字典作为数据源

dic ={'A': 100,'B': 99,'C': 150,

}

Series(data=dic)

A100B99C150dtype: int64#将numpy数组作为数据源

Series(np.random.randint(0,100,size=(3,4)))#上面这样会报错，Series只能存一维的数据

Series(np.random.randint(0,100,size=(3,)))

097

1 75

2 78dtype: int32

Series的索引类型

隐式索引：默认形式的索引(0，1，2….)

显式索引:自定义的索引,可以通过index参数设置显式索引

#加上index参数就可以将索引设为显式索引

Series(data=np.random.randint(0,100,size=(3,)), index=['A','B','C'])

A3B22C63dtype: int32

Series的索引操作和切片操作

s = Series(data=np.linspace(0,30, num=5),index=['a','b','c','d','e'])

a0.0b7.5c15.0d22.5e30.0dtype: float64#索引取值

s[1] #7.5 # 隐式索引

s['c'] #15.0 #显式索引

s.d#22.5 #显式索引

#切片

s[0:3] #隐式索引

a 0.0b7.5c15.0dtype: float64

s['a':'d'] #显式索引

a 0.0b7.5c15.0d22.5dtype: float64

Series的常用属性

shape 形状

size 元素个数

index 索引

values 值

s.shape #形状

(5,)

s.size#元素个数

5s.index#索引有显式就是显式索引

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

s.values#值

array([ 0. , 7.5, 15. , 22.5, 30. ])

Series的算术运算

法则：索引一致的元素进行算数运算否则补空

#索引一致的元素进行算数运算否则补空

s1= Series(data=[1,2,3,4,5],index=['a','b','c','d','e'])

s2= Series(data=[1,2,3,4,5],index=['a','b','f','d','e'])

s= s1+s2

a2.0b4.0c NaN

d8.0e10.0f NaN

dtype: float64

Series的常用方法

head(), tail()：显示Series的前n个或者后n个元素

unique(), nunique()：去除重复元素, 统计去重后的元素个数

isnull(), notnull()：

add() sub() mul() div() ：加、减、乘、除

s = Series(data=np.linspace(0,30, num=5),index=['a','b','c','d','e'])

a0.0b7.5c15.0d22.5e30.0dtype: float64

s.head(3) #只显示前三个 #不写默认值是5

a 0.0b7.5c15.0dtype: float64

s.tail(2) #只显示后俩个

d 22.5e30.0dtype: float64

s=Series(data=[1,2,2,4,4,4,4,5,6,3,3,4,5,3,6,1,3])

s.unique()#去掉重复元素

array([1, 2, 4, 5, 6, 3], dtype=int64)

s.nunique()#统计去重后的元素个数

6s1= Series(data=[1,2,3,4,5],index=['a','b','c','d','e'])

s2= Series(data=[1,2,3,4,5],index=['a','b','f','d','e'])

s= s1+s2

a2.0b4.0c NaN

d8.0e10.0f NaN

dtype: float64

s.isnull()#检测Series哪些元素为空，为空则返回True，否则返回Fasle

a False

b False

c True

d False

e False

f True

dtype: bool

s.notnull()#检测Series哪些元素非空，非空则返回True，否则返回Fasle

a True

b True

c False

d True

e True

f False

dtype: bool#想要将s中的非空的数据取出

s[[True,True,False,True,True,False]] #布尔值可以作为索引去取值

a 2.0b4.0d8.0e10.0dtype: float64#直接将s.notnull()得到的布尔值作为索引取值，可以对Series中空值进行清洗

s[s.notnull()]

a2.0b4.0d8.0e10.0dtype: float64

s1= Series(data=[5,3],index=['a', 'b'])

s2= Series(data=[4,2],index=['a', 'b'])

s1.add(s2)#相加

a 9b5dtype: int64

s1.sub(s2)#相减

a 1b1dtype: int64

s1.mul(s2)#相乘

a 20b6dtype: int64

s1.div(s2)#相除

a 1.25b1.50dtype: float64

DataFrame

DataFrame是一个【表格型】的数据结构。DataFrame由按一定顺序排列的多列数据组成。设计初衷是将Series的使用场景从一维拓展到多维。DataFrame既有行索引，也有列索引。

行索引：index

列索引：columns

值：values

DataFrame的创建

ndarray创建

字典创建

ndarray创建(numpy数组)

DataFrame(data=np.random.randint(0,100, size=(5,6)))

df = DataFrame(data=np.random.randint(0,100, size=(5,6)),columns=['a','b','c','d','e','f'],index=['A','B','C','D','E'])

字典创建

dic ={'name':['jay', 'alex', 'egon'],'salary': [10000, 20000, 30000]

}

DataFrame(data=dic)

DataFrame(data=dic, index=['a','b','c'])

DataFrame的属性

values、columns、index、shape

df.values #返回放它所有元素的一个numpy数组

array([[97, 2, 82, 57, 96, 87],

[75, 56, 48, 56, 4, 55],

[25, 95, 99, 80, 97, 19],

[89, 70, 51, 85, 94, 68],

[32, 73, 42, 51, 50, 88]])

df.columns#返回列索引

Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')

df.index#返回行索引

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

df.shape#返回形状

(5, 6)

DataFrame索引操作

对行进行索引

队列进行索引

对元素进行索引

索引取列

#对列进行索引取值(如果设置了显式列索引，则只能使用显式列索引取值，否则报错)

df['a'] #取一列

A 97B75C25D89E32Name: a, dtype: int32

df[['a','b']] #取多列

索引取行

iloc:

通过隐式索引取行

loc:

通过显示索引取行

df.loc['A'] #loc必须跟显式索引，否者报错 # 取一行

a 97b2c82d57e96f87Name: A, dtype: int32

df.iloc[0]#iloc必须跟显式索引，否者报错 # 取一行

a 97b2c82d57e96f87Name: A, dtype: int32

df.iloc[[1,2,3]] #取多行

df.loc[['A','B','C']]

取元素

#取单个元素

df.iloc[2,3] #先行后列

80df.loc['C','b']95

#取多个元素

df.iloc[[1,2],3] #取索引1，2行第3列的元素

B 56C80Name: d, dtype: int32

DataFrame的切片操作

对行进行切片

对列进行切片

切行

df[0:3]

切列

df.iloc[:,0:3]

df索引和切片操作总结

索引：

df[col]:取列

df.loc[index]:取行

df.iloc[index,col]:取元素

切片：

df[index1:index3]:切行

df.iloc[:,col1:col3]:切列

DataFrame的运算

同Series

练习：

假设ddd是期中考试成绩，ddd2是期末考试成绩，请自由创建ddd2，并将其与ddd相加，求期中期末平均值。

假设张三期中考试数学被发现作弊，要记为0分，如何实现？

李四因为举报张三作弊立功，期中考试所有科目加100分，如何实现？

后来老师发现有一道题出错了，为了安抚学生情绪，给每位学生每个科目都加10分，如何实现？

dic ={'张三': [100,90,90,100],'李四': [0,0,0,0]

}

df=DataFrame(data=dic, index=['语文','数学','英语','理综'])

qizhong=df

qimo= df

1、假设ddd是期中考试成绩，ddd2是期末考试成绩，请自由创建ddd2，并将其与ddd相加，求期中期末平均值。

(qizhong + qimo) / 2

2、假设张三期中考试数学被发现作弊，要记为0分，如何实现？

qizhong.loc['数学','张三'] =0

qizhong

3、李四因为举报张三作弊立功，期中考试所有科目加100分，如何实现？

qizhong['李四'] +=100qizhong

4、后来老师发现有一道题出错了，为了安抚学生情绪，给每位学生每个科目都加10分，如何实现？

qizhong += 10qizhong

时间数据类型的转换

pd.to_datetime(col)

dic ={'name':['jay','egon','bobo'],'hire_date': ['2010-10-11', '2012-12-01','2011-11-12'],'salary': [10000,20000,30000]

}

df= DataFrame(data=dic)

info返回df的基本信息

#info 返回df的基本信息

df.info()RangeIndex:3 entries, 0 to 2Data columns (total3columns):

name3 non-null object #object就是字符串类型

hire_date 3 non-null object

salary3 non-null int64

dtypes: int64(1), object(2)

memory usage:152.0+ bytes

想要将字符串形式的时间数据转换成时间序列类型

df['hire_date'] = pd.to_datetime(df['hire_date'])

df.info()

RangeIndex:3 entries, 0 to 2Data columns (total3columns):

name3 non-null object

hire_date3 non-null datetime64[ns]

salary3 non-null int64

dtypes: datetime64[ns](1), int64(1), object(1)

memory usage:152.0+ bytes

将某一列设置为行索引

df.set_index()

想要将hire_date列作为源数据的行索引

new_df = df.set_index('hire_date')

new_df

new_df.shape(3, 2)

weixin_39842955

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pandas 读取局域网文件_1-python数据分析-Pandas基础操作

1-python数据分析-Pandas基础操作为什么学习pandasnumpy已经可以帮助我们进行数据的处理了，那么学习pandas的目的是什么呢？numpy能够帮助我们处理的是数值型的数据，当然在数据分析中除了数值型的数据还有好多其他类型的数据(字符串，时间序列)，那么pandas就可以帮我们很好的处理非数值型数据！什么是pandas？对非数值型的数据进行存储和运算操作首先先来认识pandas中...
复制链接

扫一扫