python：数据分析与处理（pandas库）

最新推荐文章于 2024-06-26 22:47:56 发布

番茄大人

最新推荐文章于 2024-06-26 22:47:56 发布

阅读量451

点赞数 2

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/wd9ljs18/article/details/105507634

版权

Python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

pandas官方网站

一、Series 类型

Series类型是一维带“标签”数组

import pandas as pd
a = pd.Series([9,8,7,6])
print(a)
'''
0    9
1    8
2    7
3    6
dtype: int64
'''
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
print(b)
'''
a    9
b    8
c    7
d    6
dtype: int64
'''

1.1 Series 的创建

Python列表，index与列表元素个数一致
标量值，index表达Series类型的尺寸
Python字典，键值对中的“键”是索引，index从字典中进行选择操作
ndarray，索引和数据都可以通过ndarray类型创建
其他函数，range()函数等

1.1.1 标量值创建

import pandas as pd
s = pd.Series(25,index =['a','b','c'] )
print(s)
'''
a    25
b    25
c    25
dtype: int64
'''

1.1.2 字典创建

import pandas as pd
s = pd.Series({'a':25,'b':23,'c':1 })
print(s)
'''
a    25
b    23
c     1
dtype: int64
'''
s = pd.Series({'a':25,'b':23,'c':1 },index = ['a','d','b','c'])#注意索引顺序
print(s)
'''
a    25.0
d     NaN
b    23.0
c     1.0
dtype: float64
'''

1.1.3 ndarray创建

import pandas as pd
import numpy as np
n = pd.Series(np.arange(5))
print(n)
'''
0    0
1    1
2    2
3    3
4    4
dtype: int32
'''
n = pd.Series(np.arange(5),index = np.arange(9,4,-1))
print(n)
'''
9    0
8    1
7    2
6    3
5    4
dtype: int32
'''

1.2 Series的基本操作

Series类型包括index和values两部分。.index获得索引(index类型)，.values获得数据（numpy类型）
Series类型类似ndarray类型。
Series类型的操作类似Python字典类型。

1.2.1 Series类型包括index和values两部分

import pandas as pd
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
print(b.index)#Index(['a', 'b', 'c', 'd'], dtype='object')
print(b.values)#[9 8 7 6]
print(b['b']) #自定义索引，8
print(b[1])  #自动索引，8
print(b[['c','d',0]])#两套索引并存，但不能混用
'''
c    7.0
d    6.0
0    NaN
'''

1.2.2 Series类型类似ndarray类型

索引方法相同，采用[]
Numpy中运算和操作可用于Series类型
可以通过自定义索引的列表进行切片
可以通过自动索引进行切片，如果存在自定义索引，则一同被切片

import pandas as pd
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
print(b[:3])#获得0-3的数据
'''
a    9
b    8
c    7
'''
print(b[b>b.median()])#输出大于中位数的Series
'''
a    9
b    8
dtype: int64
'''

1.2.3 Series类型的操作类似Python字典类型

通过自定义索引
保留字in操作(判断数据是否在索引列表中)
使用.get()方法

import pandas as pd
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
print('c' in b) #True
print(0 in b)#False
print(b.get('f')) #None
print(b.get('f',100))#100,原则上为空，但此处有第二个参数，因此返回100

1.2.4 Series类型对齐操作

Series + Series

import pandas as pd
a = pd.Series([1,2,3],['c','d','e'])
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
print(a+b)#先找交集，再求和
'''
a    NaN
b    NaN
c    8.0
d    8.0
e    NaN
dtype: float64
'''

1.2.5 Series类型的name属性

Series对象和索引都可以有一个名字，存储在属性.name中

import pandas as pd
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
b.name = 'Series对象'
b.index.name = '索引列'
print(b)
'''
索引列
a    9
b    8
c    7
d    6
Name: Series对象, dtype: int64
'''

1.2.6 Series类型的修改

Series对象可以随时修改并即刻生效

import pandas as pd
b = pd.Series([9,8,7,6],index =['a','b','c','d'] )
b.name = 'Series对象'
b.name = "new series"
b['b','c'] = 20
print(b)
'''
a     9
b    20
c    20
d     6
Name: new series, dtype: int64
'''

1.2.7 Series类型数据清洗

查找空值，注意，浮点型数据查找空值用math.isnan(float_values),或用if判断np.isnan(float_values)==True

import pandas as pd
b = pd.Series([9,None,7,6],index =['a','b','c','d'] )
print(b)
'''
a    9.0
b    NaN
c    7.0
d    6.0
dtype: float64
'''
notNullIndex = b[(b.isnull() == False)].index
print(notNullIndex)# 非空索引
'''
Index(['a', 'c', 'd'], dtype='object')
'''
firstNotNull = b.index.get_loc(notNullIndex[0])
print(firstNotNull)# 获得第一个非空值的行号
'''
0
'''

二、 DataFrame类型

DataFrame是二维带“标签”数组（索引+多列数据）

DataFrame是一个表格型的数据类型，每列值类型可以不同
DataFrame既有行索引（index），也有列索引（column）
DataFrame常用于表达二维数据，但可以表达多维数据

2. DataFrame的创建

从二维ndarray对象创建
从由一维ndarray、列表、字典、元组或Series构成的字典创建
Series类型
其他DataFrame类型

2.1 从二维ndarray对象创建

原始数据+自动生成的行索引和列索引

import pandas as pd
import numpy as np
d = pd.DataFrame(np.arange(10).reshape(2,5))
print(d)
'''
0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9
'''

2.2 从一维ndarray对象字典创建

import pandas as pd
import numpy as np
s = {'one':pd.Series([1,2,3],index = ['a','d','b']),
     'two':pd.Series([9,8,7,5],index = ['a','d','b','c'])}
print(type(s))#<class 'dict'>
d = pd.DataFrame(s)
print(d)
'''
	one  two
a  1.0    9
b  3.0    7
c  NaN    5
d  2.0    8
'''
e = pd.DataFrame(s,index = ['b','c','d'],columns=['two','three'])#添加行索引和列索引
print(e)
'''
	two three
b    7   NaN
c    5   NaN
d    8   NaN
'''

2.3 从列表类型的字典创建

import pandas as pd
import numpy as np
d1 = {'one':[1,2,3,4],'two':[9,8,7,6]}
d = pd.DataFrame(d1, index=['a','b','c','d'])
print(d)
'''
	one  two
a    1    9
b    2    8
c    3    7
d    4    6
'''
print(d['two'])#根据列索引获得对应值
'''
a    9
b    8
c    7
d    6
'''
print(d.ix['b'])#根据行索引获得对应值
'''
one    2
two    8
'''

2.4 DataFrame可视化

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d1 = {'one':[1,2,3,4],'two':[9,8,7,6]}
d = pd.DataFrame(d1, index=['a','b','c','d'])
'''
   one  two
a    1    9
b    2    8
c    3    7
d    4    6
'''
#线图
plt.figure()
#font1 = {'family' : 'SimHei','weight' : 'normal','size'   : 6 }#设置字体
xtick = list(d.index)#设置x坐标轴
#plt.xticks(range(len(xtick)), xtick,rotation=60) #设置横轴格式，rotation为倾斜角度
colors = ["red","blue"]
d.plot(kind='line', label="拟合曲线", color=colors)
mpl.rcParams['font.sans-serif'] = ['SimHei'] #插入中文标题
mpl.rcParams['axes.unicode_minus'] = False
plt.title("线图")
plt.show()
#点图
plt.figure()
plt.scatter(d.index, d["one"].values, label="插入数据", color="green")
plt.title("点图")
plt.tick_params(labelsize=6)

在这里插入图片描述

番茄大人

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python：数据分析与处理（pandas库）

pandas官方网站目录Pandas入门Series 类型Series 的创建标量值创建字典创建ndarray创建Series的基本操作Series类型包括index和values两部分Series类型类似ndarray类型Series类型的操作类似Python字典类型Series类型对齐操作Series类型的name属性Series类型的修改DataFrame类型DataFrame的创建从二维...
复制链接

扫一扫