pandas中series对象

一个亿呢

已于 2022-03-16 21:18:10 修改

阅读量412

点赞数

分类专栏： pandas库文章标签： python 几何学机器学习

于 2022-03-06 23:51:54 首次发布

本文链接：https://blog.csdn.net/qq_23126569/article/details/123316136

版权

pandas库专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文详细介绍了Pandas库中的Series数据结构。通过实例展示了如何从DataFrame中提取Series，以及Series的特性，如类似字典的键值对结构，使用值作为索引等。同时，讨论了Series的索引排序，包括按索引和值进行排序的方法，并且展示了如何利用NumPy函数对Series进行数值操作。此外，还演示了如何通过布尔索引筛选数据以及两个索引相同的Series相加的操作。

摘要由CSDN通过智能技术生成

源数据为电影评分数据，原文件地址：
其实拿出数据中的一列，就是series类型，链接：https://pan.baidu.com/s/1o1DQqAYr9QHqBZ52z4LMZw?pwd=pyth
提取码：pyth
series对象略像字典这种结构，因为带键值对（此处称为索引、值）

import pandas as pd
fandango = pd.read_csv('fandango_score_comparison.csv')
series_film = fandango['FILM']
print(type(fandango))#<class 'pandas.core.frame.DataFrame'>数据矩阵的类型
print(type(series_film))#<class 'pandas.core.series.Series'>拿出其中一列，类型就是series
print(series_film[0:5])
series_rt = fandango['RottenTomatoes']
print (series_rt[0:5])

series的特点：建立series对象，可以用值作为索引，同时，数字索引依然可用

# Import the Series object from pandas
from pandas import Series
series_film = fandango['FILM']
film_names = series_film.values
print (type(film_names))#<class 'numpy.ndarray'>,pandas一些内容是封装在numpy基础之上的
rt_scores = series_rt.values#拿一列值，以备构建series对象使用
series_custom = Series(rt_scores , index=film_names)#使用film_names作为索引
series_custom[['Minions (2015)', 'Leviathan (2014)']]#行索引一般是数字，这里可以使用数值索引,当然也可以用数字，因为在构建series数据时使用film_names作为索引了
#结果为：
'''
Minions (2015)      54
Leviathan (2014)    99
dtype: int64
'''
#上面数值索引可用，下面的例子中数字索引依然是可用的

# int index is also aviable
series_custom = Series(rt_scores , index=film_names)
series_custom[['Minions (2015)', 'Leviathan (2014)']]
fiveten = series_custom[5:10]#正常使用
print(fiveten)
'''
The Water Diviner (2015)        63
Irrational Man (2015)           42
Top Five (2014)                 86
Shaun the Sheep Movie (2015)    99
Love & Mercy (2015)             89
dtype: int64
'''

索引排序后，索引对应的数据也会跟着索引一起变动，同样的，对值排序，索引也跟着变动

original_index = series_custom.index.tolist()
sorted_index = sorted(original_index)#字符串排序,original_index没变
sorted_by_index = series_custom.reindex(sorted_index)#对索引重新排，按照sorted_index的顺序排，注意索引对应的数据也跟着排序了
print(series_custom[0:3])#原series对象
'''
['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)']
Avengers: Age of Ultron (2015)    74
Cinderella (2015)                 85
Ant-Man (2015)                    80
'''
print(sorted_by_index[0:3])#排序后的series对象
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
'''
print(series_custom[["'71 (2015)","5 Flights Up (2015)","A Little Chaos (2015)"]])#挑出三数值索引以及数据对照
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
dtype: int64
'''
#对值排序，索引也跟着排序
sc2 = series_custom.sort_index()
sc3 = series_custom.sort_values()
print(sc2[0:3])
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
'''
print(sc3[0:3])
'''
'71 (2015)               97
5 Flights Up (2015)      52
A Little Chaos (2015)    40
'''

Series对象中的值被视为ndarray，即NumPy中的核心数据类型,NumPy中ndarray的操作自然是通用的

#The values in a Series object are treated as an ndarray, the core data type in NumPy
import numpy as np
# Add each value with each other
print np.add(series_custom, series_custom)#加法操作函数
# Apply sine function to each value
np.sin(series_custom)#sin函数
# Return the highest value (will return a single value not a Series)
np.max(series_custom)#求最大值函数

#像ndarray一样通过True，false序列作为索引筛选数据
#will actually return a Series object with a boolean value for each film
series_greater_than_50 = series_custom[series_custom > 50]#series_custom > 50 会生成True和False的序列
criteria_one = series_custom > 50
criteria_two = series_custom < 75
both_criteria = series_custom[criteria_one & criteria_two]#大于50小于75的留下
print (both_criteria)
'''
Avengers: Age of Ultron (2015)                True
Cinderella (2015)                             True
Ant-Man (2015)                                True
Do You Believe? (2015)                       False
Hot Tub Time Machine 2 (2015)                False
                                             ...  
Mr. Holmes (2015)                             True
'71 (2015)                                    True
Two Days, One Night (2014)                    True
Gett: The Trial of Viviane Amsalem (2015)     True
Kumiko, The Treasure Hunter (2015)            True
Length: 146, dtype: bool
'''

索引相同的两个series对象的相加

#data alignment same index
rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])
rt_mean = (rt_critics + rt_users)/2#索引相同，所以对应数值加起来
print(rt_mean)