（三篇长文让你玩6Pandas）数据分析入门_PART2常用工具包_CH02数据分析工具：Pandas__Part01（Series和DataFrame全面操作）

最新推荐文章于 2024-11-13 17:26:18 发布

Rishane

最新推荐文章于 2024-11-13 17:26:18 发布

阅读量533

点赞数

文章标签： Pandas Series DataFrame python 数据分析

本文链接：https://blog.csdn.net/weixin_40974922/article/details/93753300

版权

'''
【课程2.2】  Pandas数据结构Series：基本概念及创建

"一维数组"Serise

'''

'\n【课程2.2】  Pandas数据结构Series：基本概念及创建\n\n"一维数组"Serise\n\n'

# Series 数据结构
# Series 是带有标签的一维数组，可以保存任何数据类型（整数，字符串，浮点数，Python对象等）,轴标签统称为索引
# Series 数据结=1标签+2数据数组（可以理解为有索引顺序的dict）
import numpy as np
import pandas as pd

s=pd.Series(np.random.rand(5))
print(s)
print('------')
print(type(s))

0    0.830450
1    0.674102
2    0.528299
3    0.150878
4    0.952043
dtype: float64
------
<class 'pandas.core.series.Series'>

# Series 创建方法一：由字典创建，字典的key就是index，values就是values
#只要index有一个str则将其他数字类型自动转换为str
dic = {
   'a':1 ,'b':'hello' , 'c':3, 4:4, 5:5}
s=pd.Series(dic)
print(s)

a        1
b    hello
c        3
4        4
5        5
dtype: object

# Series 创建方法二：由数组创建(一维数组)
# 默认index是从0开始，步长为1的数字
arr=np.random.randn(5)
s=pd.Series(arr)
print(arr)
print(s)

# index参数：设置index，长度保持一致
# dtype参数：设置数值类型
'''
pd.Series(
    data=None,
    index=None,
    dtype=None,
    name=None,
    copy=False,
    fastpath=False,
)
'''
s=pd.Series(arr,index=list('abcde'),dtype=np.str)
print(s)

[ 0.26383154  0.97382125  0.13994526 -0.60732141  1.32883897]
0    0.263832
1    0.973821
2    0.139945
3   -0.607321
4    1.328839
dtype: float64
a    0.26383153983884505
b     0.9738212455558085
c     0.1399452564766354
d    -0.6073214101102407
e     1.3288389721491793
dtype: object

# Series 创建方法三：由标量创建(Series中所有元素都相同)
s=pd.Series(10,index=range(4))
print(s)

0    10
1    10
2    10
3    10
dtype: int64

# Series 名称属性：name(可有可无 一个实体series对象只有唯一一个name)
# name为Series的一个参数，创建一个数组的 名称
# .name方法：输出数组的名称，输出格式为str，如果没用定义输出名称，输出为None
s1=pd.Series(np.random.rand(5))
print(s1)
print('-----')
s2=pd.Series(np.random.rand(5),name="test")
print(s2)
print(s1.name,s2.name,type(s2.name))

# .rename()重命名一个数组的名称，并且新指向一个数组，原数组不变
s3=s2.rename('xjxj')
print(s3)
print(s3 is s2)
print(s3.name,s2.name)

0    0.603552
1    0.007823
2    0.581088
3    0.262479
4    0.366710
dtype: float64
-----
0    0.638098
1    0.012841
2    0.659852
3    0.009916
4    0.444856
Name: test, dtype: float64
None test <class 'str'>
0    0.638098
1    0.012841
2    0.659852
3    0.009916
4    0.444856
Name: xjxj, dtype: float64
False
xjxj test

在这里插入图片描述

#作业answer

#1 dict方式创建
dic={
   'Jack':90.0,'Marry':92,"Tom":89,'Zack':65}
s1=pd.Series(dic,name="作业1")
print(s1)

#2 数组方式创建
ar=np.array((90.0,92,89,65))
s2=pd.Series(ar,index=('Jack','Marry',"Tom",'Zack'),name="作业1")
print(s2)

Jack     90.0
Marry    92.0
Tom      89.0
Zack     65.0
Name: 作业1, dtype: float64
Jack     90.0
Marry    92.0
Tom      89.0
Zack     65.0
Name: 作业1, dtype: float64

'''
【课程2.3】  Pandas数据结构Series：索引

位置下标 / 标签索引 / 切片索引 / 布尔型索引

'''

'\n【课程2.3】  Pandas数据结构Series：索引\n\n位置下标 / 标签索引 / 切片索引 / 布尔型索引\n\n'

# 位置下标，类似序列 （和序列不同的index【-1】不能为负）
# 位置下标从0开始
# 输出结果为numpy.float格式，
# 可以通过float()函数转换为python float格式
# numpy.float与float占用字节不同

s=pd.Series(np.random.rand(5))
print(s[0],type(s[0]),s[0].dtype)                                                                         

print(float(s[0]),type(float(s[0])))

0.6358412386028008 <class 'numpy.float64'> float64
0.6358412386028008 <class 'float'>

# 标签索引

# 方法类似下标索引，用[]表示，内写上index，注意此处index是字符串
s=pd.Series(np.random.rand(5),index=list('abcde'))
print(s)
print(s['a'],type(s['a']),s['a'].dtype)

# 如果需要同时选择多个标签的值，用[[]]来表示（相当于[]中包含一个列表！！！）
# 多标签索引结果是新的数组
sci=s[['a','b','c']]
print(sci,type(sci))

a    0.541327
b    0.810801
c    0.296037
d    0.794296
e    0.899370
dtype: float64
0.5413267940720663 <class 'numpy.float64'> float64
a    0.541327
b    0.810801
c    0.296037
dtype: float64 <class 'pandas.core.series.Series'>

#切片索引
#1 注意：用index做切片是左闭右闭
#序列切片可以有[1:-1]负数序号

#2下标索引做切片，和list写法一样

#3 有str序列的Series也可以用下标索引（数字）做切片

s1=pd.Series(np.random.randint(10,size=5))
s2=pd.Series(np.random.randint(10,size=5),index=list("abcde"))
print(s1[1:4],'\n',s1[1:-1],'\n',s1[2])
print('---')
print(s2["a":"b"],'\n',s2['a'])
print('---')
print(s2[1:-1],'\n',s2[2])

1    3
2    5
3    0
dtype: int32 
 1    3
2    5
3    0
dtype: int32 
 5
---
a    5
b    9
dtype: int32 
 5
---
b    9
c    5
d    2
dtype: int32 
 5

# 布尔型索引
# 数组做判断之后，返回的是一个由布尔值组成的新的Series
# .isnull() / .notnull() 判断是否为空值 (None代表空值，NaN代表有问题的数值，两个都会识别为空值)

s=pd.Series(np.random.rand(3)*100)
s[4]=None #添加1个空值None
print(s)

bs1 = s > 50
bs2 = s.isnull()
bs3 = s.notnull()

print(bs1, type(bs1), bs1.dtype)
print(bs2, type(bs2), bs2.dtype)
print(bs3, type(bs3), bs3.dtype)

print('-----')

print(s>50)
print('-----')

print(s[s>50])

print('-----')

print(s[bs3])

0    12.5679
1     73.037
2    69.7116
4       None
dtype: object
0    False
1     True
2     True
4    False
dtype: bool <class 'pandas.core.series.Series'> bool
0    False
1    False
2    False
4     True
dtype: bool <class 'pandas.core.series.Series'> bool
0     True
1     True
2     True
4    False
dtype: bool <class 'pandas.core.series.Series'> bool
-----
0    False
1     True
2     True
4    False
dtype: bool
-----
1     73.037
2    69.7116
dtype: object
-----
0    12.5679
1     73.037
2    69.7116
dtype: object

在这里插入图片描述

#作业answer
s=pd.Series(np.random.rand(10)*100,index=list('abcdefghij'))
print(s)
print('-------')
print(s['b'],s['c'])
print('-------')
print(s[4:7])
print(s[[4,5,6]])
print('-------')
print(s[s>50])

a    47.610866
b    32.879041
c    60.843136
d    25.798653
e    16.734771
f    72.011496
g    13.186102
h    67.730150
i    28.785863
j    82.482446
dtype: float64
-------
32.87904131859861 60.84313579685892
-------
e    16.734771
f    72.011496
g    13.186102
dtype: float64
e    16.734771
f    72.011496
g    13.186102
dtype: float64
-------
c    60.843136
f    72.011496
h    67.730150
j    82.482446
dtype: float64

'''
【课程2.4】  Pandas数据结构Series：基本技巧

数据查看 / 重新索引 / 对齐 / 添加、修改、删除值

'''

'\n【课程2.4】  Pandas数据结构Series：基本技巧\n\n数据查看 / 重新索引 / 对齐 / 添加、修改、删除值\n\n'

# 数据查看
# .head()查看头部数据
# .tail()查看尾部数据
# 默认查看5条
s=pd.Series(np.random.rand(50))
print(s.head(10))
print(s.tail())

0    0.583793
1    0.340821
2    0.153140
3    0.726648
4    0.482695
5    0.652023
6    0.328461
7    0.177034
8    0.217062
9    0.341393
dtype: float64
45    0.425366
46    0.712421
47    0.423743
48    0.980984
49    0.146227
dtype: float64

# 重新索引reindex
# .reindex将会根据索引重新排序而不是重写index，如果当前索引不存在，则引入缺失值

s=pd.Series(np.random.rand(3),index=list('abc'))
print(s)
s1=s.reindex(list('cbad'))
print(s1)

最低0.47元/天解锁文章

Rishane

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫