源数据为电影评分数据,原文件地址:
其实拿出数据中的一列,就是series类型,链接:https://pan.baidu.com/s/1o1DQqAYr9QHqBZ52z4LMZw?pwd=pyth
提取码:pyth
series对象略像字典这种结构,因为带键值对(此处称为索引、值)
import pandas as pd
fandango = pd.read_csv('fandango_score_comparison.csv')
series_film = fandango['FILM']
print(type(fandango))#<class 'pandas.core.frame.DataFrame'>数据矩阵的类型
print(type(series_film))#<class 'pandas.core.series.Series'>拿出其中一列,类型就是series
print(series_film[0:5])
series_rt = fandango['RottenTomatoes']
print (series_rt[0:5])
series的特点:建立series对象,可以用值作为索引,同时,数字索引依然可用
# Import the Series object from pandas
from pandas import Series
series_film = fandango['FILM']
film_names = series_film.values
print (type(film_names))#<class 'numpy.ndarray'>,pandas一些内容是封装在numpy基础之上的
rt_scores = series_rt.values#拿一列值,以备构建series对象使用
series_custom = Series(rt_scores , index=film_names)#使用film_names作为索引
series_custom[['Minions (2015)', 'Leviathan (2014)']]#行索引一般是数字,这里可以使用数值索引,当然也可以用数字,因为在构建series数据时使用film_names作为索引了
#结果为:
'''
Minions (2015) 54
Leviathan (2014) 99
dtype: int64
'''
#上面数值索引可用,下面的例子中数字索引依然是可用的
# int index is also aviable
series_custom = Series(rt_scores , index=film_names)
series_custom[['Minions (2015)', 'Leviathan (2014)']]
fiveten = series_custom[5:10]#正常使用
print(fiveten)
'''
The Water Diviner (2015) 63
Irrational Man (2015) 42
Top Five (2014) 86
Shaun the Sheep Movie (2015) 99
Love & Mercy (2015) 89
dtype: int64
'''
索引排序后,索引对应的数据也会跟着索引一起变动,同样的,对值排序,索引也跟着变动
original_index = series_custom.index.tolist()
sorted_index = sorted(original_index)#字符串排序,original_index没变
sorted_by_index = series_custom.reindex(sorted_index)#对索引重新排,按照sorted_index的顺序排,注意索引对应的数据也跟着排序了
print(series_custom[0:3])#原series对象
'''
['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)']
Avengers: Age of Ultron (2015) 74
Cinderella (2015) 85
Ant-Man (2015) 80
'''
print(sorted_by_index[0:3])#排序后的series对象
'''
'71 (2015) 97
5 Flights Up (2015) 52
A Little Chaos (2015) 40
'''
print(series_custom[["'71 (2015)","5 Flights Up (2015)","A Little Chaos (2015)"]])#挑出三数值索引以及数据对照
'''
'71 (2015) 97
5 Flights Up (2015) 52
A Little Chaos (2015) 40
dtype: int64
'''
#对值排序,索引也跟着排序
sc2 = series_custom.sort_index()
sc3 = series_custom.sort_values()
print(sc2[0:3])
'''
'71 (2015) 97
5 Flights Up (2015) 52
A Little Chaos (2015) 40
'''
print(sc3[0:3])
'''
'71 (2015) 97
5 Flights Up (2015) 52
A Little Chaos (2015) 40
'''
Series对象中的值被视为ndarray,即NumPy中的核心数据类型,NumPy中ndarray的操作自然是通用的
#The values in a Series object are treated as an ndarray, the core data type in NumPy
import numpy as np
# Add each value with each other
print np.add(series_custom, series_custom)#加法操作函数
# Apply sine function to each value
np.sin(series_custom)#sin函数
# Return the highest value (will return a single value not a Series)
np.max(series_custom)#求最大值函数
#像ndarray一样通过True,false序列作为索引筛选数据
#will actually return a Series object with a boolean value for each film
series_greater_than_50 = series_custom[series_custom > 50]#series_custom > 50 会生成True和False的序列
criteria_one = series_custom > 50
criteria_two = series_custom < 75
both_criteria = series_custom[criteria_one & criteria_two]#大于50小于75的留下
print (both_criteria)
'''
Avengers: Age of Ultron (2015) True
Cinderella (2015) True
Ant-Man (2015) True
Do You Believe? (2015) False
Hot Tub Time Machine 2 (2015) False
...
Mr. Holmes (2015) True
'71 (2015) True
Two Days, One Night (2014) True
Gett: The Trial of Viviane Amsalem (2015) True
Kumiko, The Treasure Hunter (2015) True
Length: 146, dtype: bool
'''
索引相同的两个series对象的相加
#data alignment same index
rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])
rt_mean = (rt_critics + rt_users)/2#索引相同,所以对应数值加起来
print(rt_mean)