Pandas学习1-Series类型

最新推荐文章于 2024-05-06 21:40:02 发布

活下去.

最新推荐文章于 2024-05-06 21:40:02 发布

阅读量2.3k

点赞数 2

文章标签： pandas python 数据分析

本文链接：https://blog.csdn.net/snowzmy/article/details/129531545

版权

前言

Pandas 是一个开源的第三方 Python 库，从 Numpy 和 Matplotlib 的基础上构建而来，享有数据分析“三剑客之一”的盛名（NumPy、Matplotlib、Pandas）。Pandas 已经成为 Python 数据分析的必备高级工具，它的目标是成为强大、灵活、可以支持任何编程语言的数据分析工具。
Pandas 有两种重要的数据结构，Series（一维数组）和DataFrame（二维数组）。
Series是带索引的一维数组，能够存储各种数据类型，比如字符数、整数、浮点数、Python 对象等，Series 用 name 和 index 属性来描述数据值。Series 是一维数据结构，因此其维数不可以改变。
DataFrame 是一种二维表格型数据的结构，既有行索引，也有列索引。行索引是 index，列索引是 columns。
本文主要介绍Series。

一、Series是什么？

Series 结构，也称 Series 序列，是 Pandas 常用的数据结构之一，它是一种类似于一维数组的结构，由一组数据值（value）和一组标签（即索引）组成，其中标签与数据值之间是一一对应的关系。
Series的字符串表现形式为：索引在左边，值在右边。如果没有为数据指定索引，就会自动创建一个0到N-1(N为数据的长度)的整数型索引。可以通过Series的values和index属性获取其数组表现形式和索引对象。
Series 可以保存任何数据类型，比如整数、字符串、浮点数、Python 对象等。

二、创建Series对象

Series是一种类似于一维数组的对象，它由一维数组以及一组与之相关的数据标签（即索引）组成，仅由一组数据即可产生最简单的Series。Series的字符串表现形式为：索引在左边，值在右边。
如果没有为数据指定索引，就会自动创建一个0到N-1(N为数据的长度)的整数型索引。可以通过Series的values和index属性获取其数组表现形式和索引对象。

1.语法

代码如下（示例）：

import pandas as pd
s=pd.Series( data, index, dtype, copy)

参数说明如下：

参数名称	描述
data	输入的数据，可以是列表、常量、ndarray 数组等。
index	索引值必须是惟一的，如果没有传递索引，则默认为 np.arrange(n)。
dtype	dtype表示数据类型，如果没有提供，则会自动判断得出。
copy	表示对 data 进行拷贝，默认为 False。

我们也可以使用数组、字典、标量值或者 Python 对象来创建 Series 对象。

2.创建一个空对象

import numpy as np
import pandas as pd
from pandas import Series,DataFrames
s = pd.Series()
s

FutureWarning: The default dtype for empty Series will be ‘object’ instead of ‘float64’ in a future version. Specify a dtype explicitly to silence this warning.
s = pd.Series()
Series([], dtype: float64)

3.从标量值创建

import numpy as np
import pandas as pd
from pandas import Series,DataFrames
pd.Series(25,index=['a','b','c'])#index=['a','b','c']不能省略
s

a 25
b 25
c 25
dtype: int64

index=[‘a’,‘b’,‘c’]不能省略，但是可以省略index=，以下代码运行结果相同。

s=pd.Series(25,['a','b','c'])
s

标量值按照 index 的数量进行重复，并与其一一对应。

4.从python列表创建

自动索引

#自动索引
import pandas as pd
s=Series([3,5,6,8,9,2])
s

0 3
1 5
2 6
3 8
4 9
5 2
dtype: int64

自定义索引

#自定义索引
s=Series([3,5,6,8,9,2],index=['a','b','c','d','e','f'])
s

a 3
b 5
c 6
d 8
e 9
f 2
dtype: int64

省略index=，运行结果相同

#可省略index=
obj2=Series([3,5,6,8,9,2],['a','b','c','d','e','f'])
obj2

5.从ndarray创建

ndarray 是 NumPy 中的数组类型，当 data 是 ndarry 时，传递的索引必须具有与数组相同的长度。假如没有给 index 参数传参，在默认情况下，索引值将使用是 range(n) 生成，其中 n 代表数组长度，如下所示：
[0,1,2,3…. range(len(array))-1]

使用默认索引，创建 Series 序列对象：

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(['a','b','c','d'])
print (s)

0 a
1 b
2 c
3 d
dtype: object

自定义索引，创建 Series 序列对象

#自定义索引标签（即显示索引）
s = pd.Series(['a','b','c','d'],index=[100,101,102,103])
print(s)

100 a
101 b
102 c
103 d
dtype: object
如果索引个数和数组个数不匹配，会报错。如：
Length of values (4) does not match length of index (5)。

6.从字典类型创建

直接使用字典创建Series类型：

d=pd.Series({'a':9,'b':8,'c':7})
d

a 9
b 8
c 7
dtype: int64

d=pd.Series({'a':9,'b':8,'c':7},index=['c','a','b','d'])#指定索引
d

c 7.0
a 9.0
b 8.0
d NaN
dtype: float64
index指定Series的结构，并从字典中选取对应值。
当传递的索引值无法找到与其对应的值时，使用 NaN（非数字）填充。

7.从其他函数创建

Series也可以从其他函数创建，例，从arange创建。

n=pd.Series(np.arange(5))
n

0 0
1 1
2 2
3 3
4 4
dtype: int32

n=pd.Series(np.arange(5),index=np.arange(9,4,-1))
n

9 0
8 1
7 2
6 3
5 4
dtype: int32

三、Series类型的基本操作

Series创建好后，就可以对Series类型进行一些基本操作。Series操作类似于python列表，类似于numpy数组，也类似于python字典。

1.获得所有索引和值

import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])
b

a 9
b 8
c 7
d 6
dtype: int64

#获取所有索引
b.index

Index([‘a’, ‘b’, ‘c’, ‘d’], dtype=‘object’)
索引类型index

#获取所有值
b.values

array([9, 8, 7, 6], dtype=int64)
值的类型array

2.获取单个或一组值

可以使用自动索引（位置索引）访问，也可以使用自定义索引（索引标签）访问。

（1）访问单个值

可以使用自定义索引

b['a']

9
可以使用自动索引访问，自动索引有系统自动生成

b[0]

（2）访问多个值

可以使用自定义索引访问多个元素值

b[['b','d','a']]

b 8
d 6
a 9
dtype: int64

b[['b','d',0]]

KeyError: ‘[0] not in index’
自动索引和自定义索引并存，两套索引并存，但不能混用。

通过切片的方式访问 Series 序列中的数据

b[:3]

a 9
b 8
c 7
dtype: int64

3.可以对Series进行NumPy数组运算

b[b>7]

a 9
b 8
dtype: int64

4.可以对Series进行算数运算

在运算过程中，pandas会自动对齐不同索引的数据。

dic={'m':4,'n':5,'p':6}
a=pd.Series(dic)
ind=['m','n','a','b']
b=pd.Series([9,8,7,6],index=ind)
a+b

a NaN
b NaN
m 13.0
n 13.0
p NaN
dtype: float64

5.Series的修改

Series对象可以随时修改立即生效。
Series的索引可以通过赋值的方式进行改变。

b.index=['u','v','w','a']
b

u 9
v 8
w 7
a 6
dtype: int64

Series对象本身及其索引都可以有一个名字，存储在属性.name中。

b.name='Sereis对象b'
b.index.name='索引列'
b

索引列
u 9
v 8
w 7
a 6
Name: Sereis对象b, dtype: int64

四、常用属性和方法

1.常用属性

名称	属性
axes	以列表的形式返回所有行索引标签。
dtype	返回对象的数据类型。
empty	返回一个空的 Series 对象。
ndim	返回输入数据的维数。
size	返回输入数据的元素数量。
values	以 ndarray 的形式返回 Series 对象。
index	返回一个RangeIndex对象，用来描述索引的取值范围。

import pandas as pd
b=pd.Series([9,8,7,6],['a','b','c','d'])
print("The axes are:",b.axes)
print("The dtype is:",b.dtype)
print("是否为空对象?",b.empty)
print ("b.ndim：",b.ndim)
print("series的长度大小：",b.size)
print("输出series中数据：",b.values)
print("b.index：",b.index)

The axes are: [Index([‘a’, ‘b’, ‘c’, ‘d’], dtype=‘object’)]
The dtype is: int64
是否为空对象? False
b.ndim： 1
series的长度大小： 4
输出series中数据： [9 8 7 6]
b.index： Index([‘a’, ‘b’, ‘c’, ‘d’], dtype=‘object’)

b=pd.Series([9,8,7,6])
print("The axes are:",b.axes)
print("b.index：",b.index)

The axes are: [RangeIndex(start=0, stop=4, step=1)]
b.index： RangeIndex(start=0, stop=4, step=1)

2.常用方法

1. head()&tail()查看数据

如果想要查看 Series 的某一部分数据，可以使用 head() 或者 tail() 方法。其中 head() 返回前 n 行数据，默认显示前 5 行数据。tail() 返回的是后 n 行数据，默认为后 5 行。

2. isnull()&nonull()检测缺失值

isnull() 和 nonull() 用于检测 Series 中的缺失值。所谓缺失值，顾名思义就是值不存在、丢失、缺少。
isnull()：如果为值不存在或者缺失，则返回 True。
notnull()：如果值不存在或者缺失，则返回 False。

import pandas as pd
s = pd.Series(range(10))
print ("The original series is:",s)#0-9
print ("s.head():",s.head())#0-4
#返回前三行数据
print ("s.head(3)",s.head(3))#0-2
print ("s.tail():",s.tail())#5-9
#返回后三行数据
print ("s.tail(3)",s.tail(3))#7-9
print(pd.isnull(s))  #False,
print(pd.notnull(s)) #True