python pandas series_python学习——pandas 的Series与DataFrame

最新推荐文章于 2021-10-08 12:05:29 发布

weixin_39769091

最新推荐文章于 2021-10-08 12:05:29 发布

阅读量160

点赞数

文章标签： python pandas series

将鱼图像数据进行操作，使用numpy知识

In [5]:

import numpy as np

In [6]:

import matplotlib.pyplot as plt

%matplotlib inline

In [3]:

fish = plt.imread('fish.png')

In [4]:

plt.imshow(fish)

Out[4]:

In [5]:

fish.shape

Out[5]:

(243, 326, 3)

In [6]:

fish2 = fish[::-1]

plt.imshow(fish2)

Out[6]:

In [7]:

fish3 = fish[::,::-1]

plt.imshow(fish3)

Out[7]:

In [8]:

# r g b

# b g r

fish4 = fish[::,::,::-1]

plt.imshow(fish4)

Out[8]:

In [9]:

fish5 = fish[::5,::5]

plt.imshow(fish5)

Out[9]:

In [10]:

fish6 = fish.copy()

In [12]:

fish

Out[12]:

array([[[ 0.29411766, 0.39215687, 0.46666667],

[ 0.46666667, 0.48627451, 0.49803922],

[ 0.4627451 , 0.48627451, 0.50196081],

...,

[ 0.4627451 , 0.48235294, 0.49803922],

[ 0.45882353, 0.47843137, 0.49803922],

[ 0.21960784, 0.33333334, 0.44313726]],

[[ 0.29019609, 0.3764706 , 0.44313726],

[ 0.627451 , 0.6156863 , 0.60784316],

[ 0.85490197, 0.85490197, 0.84705883],

...,

[ 0.86274511, 0.85882354, 0.8509804 ],

[ 0.8509804 , 0.8509804 , 0.84313726],

[ 0.30588236, 0.42352942, 0.52549022]],

[[ 0.28235295, 0.37254903, 0.43921569],

[ 0.66666669, 0.66274512, 0.65490198],

[ 1. , 1. , 1. ],

...,

[ 1. , 1. , 1. ],

[ 0.35686275, 0.47450981, 0.57647061]],

...,

[[ 0.4509804 , 0.45882353, 0.45882353],

[ 0.65098041, 0.65098041, 0.64705884],

[ 0.99215686, 0.99215686, 0.98431373],

...,

[ 1. , 0.99607843, 0.98823529],

[ 0.98431373, 0.98823529, 0.98039216],

[ 0.36078432, 0.49019608, 0.60000002]],

[[ 0.4509804 , 0.45882353, 0.45882353],

[ 0.65098041, 0.65098041, 0.64705884],

[ 0.99215686, 0.99215686, 0.98431373],

...,

[ 1. , 0.99607843, 0.98823529],

[ 0.98431373, 0.98823529, 0.98039216],

[ 0.36078432, 0.49019608, 0.60000002]],

[[ 0.44705883, 0.45490196, 0.45490196],

[ 0.65882355, 0.65490198, 0.65490198],

[ 1. , 1. , 1. ],

...,

[ 1. , 1. , 1. ],

[ 0.36078432, 0.49411765, 0.60000002]]], dtype=float32)

In [20]:

fish6[80:120,80:110] = np.ones((40,30,3))

plt.imshow(fish6)

Out[20]:

Pandas的数据结构

·pandas是基于numpy的一种工具，是为了解决数据分析任务而创建的

·pandas纳入了大量库和一些标准的数据模型，提供了高效地操作大型数据集所需的工具

·pandas提供了大量能使我们快速便捷处理数据的函数和方法

导入pandas：

三剑客：numpy、pands、matplotlib

In [7]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

from pandas import Series,DataFrame

1、Series

Series是一种类似与一维数组的对象，由下面两个部分组成：

values：一组数据(ndarray类型)

index：相关的数据索引标签

In [2]:

nd = np.array([1,4,5,2,3,7])

nd[2]

Out[2]:

1)Series的创建

两种创建方式：

(1) 由列表或numpy数组创建

默认索引为0到N-1的整数型索引

In [8]:

n = np.array([0,2,4,6,8])

#Series和ndarray差别，有没有具体的索引

s = Series(n)

Out[8]:

0 0

1 2

2 4

3 6

4 8

dtype: int32

In [9]:

#Series包含ndarray

#Series功能就会强大，索引，检索方便很多

s.values

Out[9]:

array([0, 2, 4, 6, 8])

In [10]:

Out[10]:

array([0, 2, 4, 6, 8])

还可以通过设置index参数指定索引

In [11]:

s.index = list('abcde')

Out[11]:

a 0

b 2

c 4

d 6

e 8

dtype: int32

In [12]:

s.index = ['张三','李四','Michael','sara','lisa']

Out[12]:

张三 0

李四 2

Michael 4

sara 6

lisa 8

dtype: int32

In [13]:

s = Series(n,index=['张三','李四','Michael','sara','lisa'])

Out[13]:

张三 0

李四 2

Michael 4

sara 6

lisa 8

dtype: int32

特别地，由ndarray创建的是引用，而不是副本。对Series元素的改变也会改变原来的ndarray对象中的元素。(列表没有这种情况)

In [28]:

s['张三'] = 100

In [29]:

Out[29]:

张三 100

李四 2

Michael 4

sara 6

lisa 8

dtype: int64

In [30]:

Out[30]:

array([100, 2, 4, 6, 8])

(2) 由字典创建

In [14]:

s2 = Series({'a':1,'b':2,'c':3})

Out[14]:

a 1

b 2

c 3

dtype: int64

In [42]:

dic = {'a':np.random.randint(0,10,size = (2,3)),

'b':np.random.randint(0,10,size = (2,3)),

'c':np.random.randint(0,10,size = (2,3))}

s2 = Series(dic)

Out[42]:

a [[1, 8, 0], [6, 4, 2]]

b [[2, 7, 8], [3, 7, 5]]

c [[6, 2, 3], [9, 6, 7]]

dtype: object

============================================

练习1：

使用多种方法创建以下Series，命名为s1：

语文 150

数学 150

英语 150

理综 300

============================================

In [19]:

s = Series({'语文':150,'数学':150,'英语':150,'理综':300})

Out[19]:

语文 150

数学 150

英语 150

理综 300

dtype: int64

In [21]:

s = Series(data=(150,150,150,300),index=['语文','数学','英语','Python'])

Out[21]:

语文 150

数学 150

英语 150

Python 300

dtype: int64

2)Series的索引和切片

可以使用中括号取单个索引(此时返回的是元素类型)，或者中括号里一个列表取多个索引(此时返回的仍然是一个Series类型)。分为显示索引和隐式索引：

(1) 显式索引：

- 使用index中的元素作为索引值

- 使用.loc[](推荐)

注意，此时是闭区间

In [23]:

s = Series(np.random.random(10),index=list('abcdefghig'))

Out[23]:

a 0.448285

b 0.423429

c 0.693456

d 0.411740

e 0.571974

f 0.962127

g 0.696547

h 0.010623

i 0.582683

g 0.256328

dtype: float64

In [24]:

s['a']

Out[24]:

0.4482852027228874

In [26]:

s.loc['a']

Out[26]:

0.4482852027228874

In [27]:

s.loc[0]

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

----> 1s.loc[0]

E:\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)

1498

1499 maybe_callable = com.apply_if_callable(key, self.obj)

-> 1500return self._getitem_axis(maybe_callable, axis=axis)

1501

1502 def _is_scalar_access(self, key):

E:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)

1910

1911 # fall thru to straight lookup

-> 1912self._validate_key(key, axis)

1913 return self._get_label(key, axis=axis)

1914

E:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_key(self, key, axis)

1797

1798 if not is_list_like_indexer(key):

-> 1799self._convert_scalar_indexer(key, axis)

1800

1801 def _is_scalar_access(self, key):

E:\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_scalar_indexer(self, key, axis)

260 ax = self.obj._get_axis(min(axis, self.ndim - 1))

261 # a scalar

--> 262return ax._convert_scalar_indexer(key, kind=self.name)

263

264 def _convert_slice_indexer(self, key, axis):

E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _convert_scalar_indexer(self, key, kind)

2879 elif kind in ['loc'] and is_integer(key):

2880 if not self.holds_integer():

-> 2881return self._invalid_indexer('label', key)

2882

2883 return key

E:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _invalid_indexer(self, form, key)

3065 "indexers [{key}] of {kind}".format(

3066 form=form, klass=type(self), key=key,

-> 3067kind=type(key)))

3068

3069 # --------------------------------------------------------------------

TypeError: cannot do label indexing on with these indexers [0] of

(2) 隐式索引：

- 使用整数作为索引值

- 使用.iloc[](推荐)

注意，此时是半开区间

In [28]:

#ndarray 极其相似

s[0]

Out[28]:

0.4482852027228874

In [29]:

s.iloc[[1,2]]

Out[29]:

b 0.423429

c 0.693456

dtype: float64

In [30]:

Out[30]:

a 0.448285

b 0.423429

c 0.693456

d 0.411740

e 0.571974

f 0.962127

g 0.696547

h 0.010623

i 0.582683

g 0.256328

dtype: float64

In [33]:

s['a':'c']

Out[33]:

a 0.448285

b 0.423429

c 0.693456

dtype: float64

In [34]:

#进行切片

s.loc['a':'e']

Out[34]:

a 0.448285

b 0.423429

c 0.693456

d 0.411740

e 0.571974

dtype: float64

In [31]:

#左闭右开

s.iloc[0:2]

Out[31]:

a 0.448285

b 0.423429

dtype: float64

============================================

练习2：

使用多种方法对练习1创建的Series s1进行索引和切片：

索引：数学 150

切片：语文 150 数学 150 英语 150

============================================

3)Series的基本概念

可以把Series看成一个定长的有序字典

可以通过shape，size，index,values等得到series的属性

In [35]:

s.shape

Out[35]:

(10,)

In [36]:

s.size

Out[36]:

In [38]:

s.index

Out[38]:

Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'g'], dtype='object')

In [37]:

#Series.values就是一个ndarray包含关系，升级关系

#有了索引之后很方便

#百度网站：网站索引

s.values

Out[37]:

array([0.4482852 , 0.42342936, 0.69345594, 0.41174025, 0.57197399,

0.96212691, 0.69654684, 0.0106225 , 0.58268298, 0.25632816])

可以通过head(),tail()快速查看Series对象的样式

In [123]:

df = pd.read_csv('data/president_heights.csv')

Out[123]:

ordernameheight

gorge

189.0

james

170.0

bob

163.0

jim

183.0

lucy

171.0

lile

192.0

NaN

In [124]:

type(df)

Out[124]:

pandas.core.frame.DataFrame

In [125]:

s_name = df['name']

type(s_name)

Out[125]:

pandas.core.series.Series

In [52]:

#默认5个

s_name.head(3)

Out[52]:

0 gorge

1 james

2 bob

Name: name, dtype: object

In [49]:

s_name.tail()

Out[49]:

2 bob

3 jim

4 lucy

5 lile

6 NaN

Name: name, dtype: object

当索引没有对应的值时，可能出现缺失数据显示NaN(not a number)的情况

In [87]:

s = Series([1,26,None,np.nan],index=list('风火雷电'))

Out[87]:

风 1.0

火 26.0

雷 NaN

电 NaN

dtype: float64

In [67]:

#Series中nan自动转换为float类型数据，ndarray则不可以

s.sum()

Out[67]:

27.0

In [62]:

None == np.nan

Out[62]:

False

In [63]:

display(type(None),type(np.nan))

NoneType

float

可以使用pd.isnull()，pd.notnull()，或自带isnull(),notnull()函数检测缺失数据

In [68]:

pd.isnull(s)

Out[68]:

风 False

火 False

雷 True

电 True

dtype: bool

In [69]:

s.isnull()

Out[69]:

风 False

火 False

雷 True

电 True

dtype: bool

In [70]:

s_notnull = s.notnull()

In [71]:

s_notnull

Out[71]:

风 True

火 True

雷 False

电 False

dtype: bool

In [72]:

#过滤掉空值

s[s_notnull]

Out[72]:

风 1.0

火 26.0

dtype: float64

Series对象本身及其实例都有一个name属性

In [89]:

#name是在DataFrame中用于区分，

# name在DataFrame中是列名

s.name = '姓名'

In [90]:

Out[90]:

风 1.0

火 26.0

雷 NaN

电 NaN

Name: 姓名, dtype: float64

In [126]:

Out[126]:

ordernameheight

gorge

189.0

james

170.0

bob

163.0

jim

183.0

lucy

171.0

lile

192.0

NaN

In [127]:

df['name']

Out[127]:

0 gorge

1 james

2 bob

3 jim

4 lucy

5 lile

6 NaN

Name: name, dtype: object

4)Series的运算

(1) 适用于numpy的数组运算也适用于Series

In [91]:

Out[91]:

风 1.0

火 26.0

雷 NaN

电 NaN

Name: 姓名, dtype: float64

In [92]:

s + 10

Out[92]:

风 11.0

火 36.0

雷 NaN

电 NaN

Name: 姓名, dtype: float64

In [100]:

#在进行算数运算时，如果包含Nan，那么file_value默认将Nan设置为后面的值

s.add(10,fill_value=0)

Out[100]:

风 11.0

火 36.0

雷 10.0

电 10.0

Name: 姓名, dtype: float64

(2) Series之间的运算

在运算中自动对齐不同索引的数据

如果索引不对应，则补NaN

In [119]:

#当两个Series进行相加时，如果索引不对应，那么就会填补Nan

# + 算数运算符

s1 = Series([1,1,1],[1,2,3])

s2 = Series([1,1,1,1,1],[2,3,4,5,6])

s1 + s2

Out[119]:

1 NaN

2 2.0

3 2.0

4 NaN

5 NaN

6 NaN

dtype: float64

注意：要想保留所有的index，则需要使用.add()函数

In [120]:

s1.add(s2,fill_value=0)

Out[120]:

1 1.0

2 2.0

3 2.0

4 1.0

5 1.0

6 1.0

dtype: float64

============================================

练习3：

想一想Series运算和ndarray运算的规则有什么不同？

新建另一个索引包含“文综”的Series s2，并与s2进行多种算术操作。思考如何保存所有数据。

============================================

In [121]:

s = Series([1,2,None])

nd = np.array([1,2,None])

nd1 = np.array([1,2,np.nan])

display(s,nd,nd1)

0 1.0

1 2.0

2 NaN

dtype: float64

array([1, 2, None], dtype=object)

array([ 1., 2., nan])

1.Series中有nan可以进行运算，ndarray中如果有nan则不可以进行运算

2、DataFrame

DataFrame是一个【表格型】的数据结构，可以看做是【由Series组成的字典】(共用同一个索引)。DataFrame由按一定顺序排列的多列数据组成。设计初衷是将Series的使用场景从一维拓展到多维。DataFrame既有行索引，也有列索引。

行索引：index

列索引：columns

值：values(numpy的二维数组)

In [128]:

Out[128]:

ordernameheight

gorge

189.0

james

170.0

bob

163.0

jim

183.0

lucy

171.0

lile

192.0

NaN

In [131]:

display(df.index,df.columns,df.values,df.values.shape)

RangeIndex(start=0, stop=7, step=1)

Index(['order', 'name', 'height'], dtype='object')

array([['1', 'gorge', 189.0],

['2', 'james', 170.0],

['3', 'bob', 163.0],

['4', 'jim', 183.0],

['5', 'lucy', 171.0],

['6', 'lile', 192.0],

[' ', nan, nan]], dtype=object)

(7, 3)

1)DataFrame的创建

最常用的方法是传递一个字典来创建。DataFrame以字典的键作为每一【列】的名称，以字典的值(一个数组)作为每一列。

此外，DataFrame会自动加上每一行的索引(和Series一样)。

同Series一样，若传入的列与字典的键不匹配，则相应的值为NaN。

In [8]:

dic = {

'name':['张三','石六','Sara'],

'age':[22,33,18],

'sex':['male','female','male'],'weight':[65,72,53]}

df = DataFrame(dic,

index=list('ABC'),

columns=['name','age','sex','weight'])

Out[8]:

nameagesexweight

张三

male

石六

female

Sara

male

DataFrame属性：values、columns、index、shape

In [141]:

df.values

Out[141]:

array([['张三', 22, 'male', 65],

['石六', 33, 'female', 72],

['Sara', 18, 'male', 53]], dtype=object)

============================================

练习4：

根据以下考试成绩表，创建一个DataFrame，命名为df：

张三李四

语文 150 0

数学 150 0

英语 150 0

理综 300 0

============================================

In [9]:

dic = {'张三':[150,150,150,300],'李四':[0,0,0,0]}

DataFrame(dic,index=['语文','数学','英语','理综'])

Out[9]:

张三李四

语文

150

数学

150

英语

150

理综

300

2)DataFrame的索引

(1) 对列进行索引

- 通过类似字典的方式

- 通过属性的方式

可以将DataFrame的列获取为一个Series。返回的Series拥有原DataFrame相同的索引，且name属性也已经设置好了，就是相应的列名。

In [144]:

dic = {

'name':['张三','石六','Sara'],

'age':[22,33,18],

'sex':['male','female','male'],'weight':[65,72,53]}

df = DataFrame(dic,

index=list('ABC'),

columns=['name','age','sex','weight'])

Out[144]:

nameagesexweight

张三

male

石六

female

Sara

male

In [154]:

# 对于切片而言，没有列切片

# 以为列是属性

df['name':'age']

Out[154]:

nameagesexweight

In [146]:

#检索列返回值，是Series

age = df['age']

display(type(age),age)

pandas.core.series.Series

A 22

B 33

C 18

Name: age, dtype: int64

In [147]:

#对于DataFrame而言，列名相当于属性

#DataFrame是统计数据时，用的表格，某一个事物的属性

# 每一个属性对应DataFrame中列名

df.age

Out[147]:

A 22

B 33

C 18

Name: age, dtype: int64

(2) 对行进行索引

- 使用.loc[]加index来进行行索引

- 使用.iloc[]加整数来进行行索引

同样返回一个Series，index为原来的columns。

In [151]:

#对于行的检索的返回值，也是Series

#!!!['A','B'],如果检索多行返回的数据时DataFrame

df.loc[['A','B']]

Out[151]:

nameagesexweight

张三

male

石六

female

In [152]:

# 切片，左闭右闭

df.loc['A':'C']

Out[152]:

nameagesexweight

张三

male

石六

female

Sara

male

In [155]:

#左闭右开

df.iloc[1:3]

Out[155]:

nameagesexweight

石六

female

Sara

male

！！！DataFrame自身有bug，如果索引是汉字，有事无法检索结果

(3) 对元素索引的方法

- 使用列索引

- 使用行索引(iloc[3,1]相当于两个参数;iloc[[3,3]] 里面的[3,3]看做一个参数)

- 使用values属性(二维numpy数组)

In [10]:

Out[10]:

nameagesexweight

张三

male

石六

female

Sara

male

In [11]:

df['sex']['B'] = '女博士'

D:\Anaconda\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

"""Entry point for launching an IPython kernel.

Out[11]:

nameagesexweight

张三

male

石六

女博士

Sara

male

In [166]:

df.loc['C']['name']

Out[166]:

'Sara'

In [164]:

df.loc['C','name']

Out[164]:

'Sara'

检索行的时候，参数可以多个，但是检索列的时候，无法这样操作

In [167]:

df.values[0,2]

Out[167]:

'male'

【注意】直接用中括号时：

索引表示的是列索引

切片表示的是行切片

In [168]:

#列索引，没有返回值

df['height':'age']

Out[168]:

nameagesexweight

============================================

练习5：

使用多种方法对ddd进行索引和切片，并比较其中的区别

============================================

3)DataFrame的运算

(1) DataFrame之间的运算

同Series一样：

在运算中自动对齐不同索引的数据

如果索引不对应，则补NaN

创建DataFrame df1 不同人员的各科目成绩，月考一

In [13]:

df1 = DataFrame(np.random.randint(0,150,size=(4,4)),

index=['张三','李四','王五','老刘'],

columns = ['语文','数学','英语','python'])

df1

Out[13]:

语文数学英语python

张三

109

114

105

李四

王五

111

老刘

145

101

In [14]:

df4 = DataFrame(np.random.randint(0,150,size=(5,3)),

index=['张三','李四','王五','老刘','小昭'],

columns = ['数学','英语','python'])

df4

Out[14]:

数学英语python

张三

116

124

125

李四

137

100

王五

117

老刘

133

小昭

119

In [15]:

df5 = df1.add(df4,fill_value=0)

df5

Out[15]:

python数学英语语文

小昭

8.0

119.0

25.0

NaN

张三

230.0

225.0

238.0

12.0

李四

65.0

155.0

135.0

20.0

王五

133.0

146.0

43.0

74.0

老刘

234.0

244.0

26.0

29.0

In [16]:

df5['语文'].loc['小昭'] = 109

df5

Out[16]:

python数学英语语文

小昭

8.0

119.0

25.0

109.0

张三

230.0

225.0

238.0

12.0

李四

65.0

155.0

135.0

20.0

王五

133.0

146.0

43.0

74.0

老刘

234.0

244.0

26.0

29.0

创建DataFrame df2 不同人员的各科目成绩，月考二

有新学生转入

In [17]:

df2 = DataFrame(np.random.randint(0,150,size= (5,4)),

index=['张三','李四','王五','老刘','校长'],

columns = ['语文','数学','英语','python'])

df2

Out[17]:

语文数学英语python

张三

102

103

李四

143

王五

113

老刘

145

校长

In [18]:

df1+df2

Out[18]:

语文数学英语python

张三

114.0

120.0

171.0

208.0

李四

163.0

43.0

131.0

91.0

校长

NaN

王五

187.0

178.0

45.0

69.0

老刘

174.0

170.0

25.0

196.0

In [21]:

#加和的结果

df3 = df1.add(df2,fill_value=0)

df3

Out[21]:

语文数学英语python

张三

114.0

120.0

171.0

208.0

李四

163.0

43.0

131.0

91.0

校长

10.0

89.0

66.0

36.0

王五

187.0

178.0

45.0

69.0

老刘

174.0

170.0

25.0

196.0

下面是Python 操作符与pandas操作函数的对应表：

Python OperatorPandas Method(s)

add()

sub(), subtract()

mul(), multiply()

truediv(), div(), divide()

floordiv()

mod()

pow()

(2) Series与DataFrame之间的运算

【重要】

使用Python操作符：以行为单位操作(参数必须是行)，对所有行都有效。(类似于numpy中二维数组与一维数组的运算，但可能出现NaN)

使用pandas操作函数：

axis=0：以列为单位操作(参数必须是列)，对所有列都有效。

axis=1：以行为单位操作(参数必须是行)，对所有行都有效。

In [22]:

s1 = df5['python']

Out[22]:

小昭 8.0

张三 230.0

李四 65.0

王五 133.0

老刘 234.0

Name: python, dtype: float64

In [23]:

s2 = df5.loc['小昭']

Out[23]:

python 8.0

数学 119.0

英语 25.0

语文 109.0

Name: 小昭, dtype: float64

In [24]:

df5

Out[24]:

python数学英语语文

小昭

8.0

119.0

25.0

109.0

张三

230.0

225.0

238.0

12.0

李四

65.0

155.0

135.0

20.0

王五

133.0

146.0

43.0

74.0

老刘

234.0

244.0

26.0

29.0

In [25]:

display(df5.columns,s1.index)

Index(['python', '数学', '英语', '语文'], dtype='object')

Index(['小昭', '张三', '李四', '王五', '老刘'], dtype='object')

In [26]:

df5 + s1

Out[26]:

python小昭张三数学李四王五老刘英语语文

小昭

NaN

张三

NaN

李四

NaN

王五

NaN

老刘

NaN

In [27]:

display(df5.columns,s2.index)

Index(['python', '数学', '英语', '语文'], dtype='object')

In [28]:

#广播模式

df5 + s2

Out[28]:

python数学英语语文

小昭

16.0

238.0

50.0

218.0

张三

238.0

344.0

263.0

121.0

李四

73.0

274.0

160.0

129.0

王五

141.0

265.0

68.0

183.0

老刘

242.0

363.0

51.0

138.0

axis=0：以列为单位操作(参数必须是列)，对所有列都有效。

axis=1：以行为单位操作(参数必须是行)，对所有行都有效。

In [35]:

ss = df5.loc['张三']

Out[35]:

python 230.0

数学 225.0

英语 238.0

语文 12.0

Name: 张三, dtype: float64

In [43]:

df5

Out[43]:

python数学英语语文

小昭

8.0

119.0

25.0

109.0

张三

230.0

225.0

238.0

12.0

李四

65.0

155.0

135.0

20.0

王五

133.0

146.0

43.0

74.0

老刘

234.0

244.0

26.0

29.0

In [52]:

df5.add(ss1,axis='index')

Out[52]:

python数学英语语文

小昭

16.0

127.0

33.0

117.0

张三

460.0

455.0

468.0

242.0

李四

130.0

220.0

200.0

85.0

王五

266.0

279.0

176.0

207.0

老刘

468.0

478.0

260.0

263.0

In [46]:

# 0对应index行索引；1对应columns列索引

df5.add(ss,axis='columns')

Out[46]:

python数学英语语文

小昭

238.0

344.0

263.0

121.0

张三

460.0

450.0

476.0

24.0

李四

295.0

380.0

373.0

32.0

王五

363.0

371.0

281.0

86.0

老刘

464.0

469.0

264.0

41.0

axis=0(0==index 行)：以列为单位操作(参数必须是列)，对应列都有效；

axis=1(1==columns 列)：以行为单位操作(参数必须是行)，对所有行都有效。

In [37]:

ss1 = df5.python

ss1

Out[37]:

小昭 8.0

张三 230.0

李四 65.0

王五 133.0

老刘 234.0

Name: python, dtype: float64

============================================

练习6：

假设ddd是期中考试成绩，ddd2是期末考试成绩，请自由创建ddd2，并将其与ddd相加，求期中期末平均值。

假设张三期中考试数学被发现作弊，要记为0分，如何实现？

李四因为举报张三作弊立功，期中考试所有科目加100分，如何实现？

后来老师发现有一道题出错了，为了安抚学生情绪，给每位学生每个科目都加10分，如何实现？

============================================

In [88]:

df1 = DataFrame(np.random.randint(0,150,size=16).reshape((4,4)),

columns = ['Chinese','English','Russia','Python'],

index = list('ABCD'))

df1

Out[88]:

ChineseEnglishRussiaPython

107

114

128

106

In [89]:

df2 = DataFrame(np.random.randint(0,150,size=16).reshape((4,4)),

columns = ['Chinese','English','Russia','Python'],

index = list('ABCD'))

df2

Out[89]:

ChineseEnglishRussiaPython

142

112

134

125

135

127

In [90]:

1#求平均值

(df1+df2)/2

Out[90]:

ChineseEnglishRussiaPython

54.0

81.0

78.5

74.0

100.5

106.5

107.5

33.5

113.5

34.0

82.0

24.5

30.0

74.5

70.5

116.5

In [92]:

2#求平均值

df1.add(df2,fill_value=0)/2

Out[92]:

ChineseEnglishRussiaPython

54.0

81.0

78.5

74.0

100.5

106.5

107.5

33.5

113.5

34.0

82.0

24.5

30.0

74.5

70.5

116.5

假设张三期中考试数学被发现作弊，要记为0分，如何实现？李四因为举报张三作弊立功，期中考试所有科目加100分，如何实现？

In [94]:

#张三数学0分

df1['Russia']['B']=0

df1

Out[94]:

ChineseEnglishRussiaPython

107

114

128

106

In [96]:

df1.loc['C'] += 100

df1

Out[96]:

ChineseEnglishRussiaPython

107

192

126

214

109

128

106

后来老师发现有一道题出错了，为了安抚学生情绪，给每位学生每个科目都加10分，如何实现？

In [98]:

df1 += 10

df1

Out[98]:

ChineseEnglishRussiaPython

101

117

202

136

224

119

138

116

weixin_39769091

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python pandas series_python学习——pandas 的Series与DataFrame

将鱼图像数据进行操作，使用numpy知识In[5]:import numpy as npIn[6]:import matplotlib.pyplot as plt%matplotlib inlineIn[3]:fish = plt.imread('fish.png')In[4]:plt.imshow(fish)Out[4]:In[5]:fish.shapeOut[5]:(243, 326...
复制链接

扫一扫