04_Pandas的数据结构

最新推荐文章于 2024-07-23 16:20:46 发布

weixin_30376323

最新推荐文章于 2024-07-23 16:20:46 发布

阅读量231

点赞数

文章标签：数据结构与算法 python golang

原文链接：http://www.cnblogs.com/pankypan/p/11567937.html

版权

Pandas的数据结构

导入pandas：
三剑客

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


from pandas import Series, DataFrame

1、Series

Series是一种类似于一维数组的对象，由下面两个部分组成：

values：一组数据（ndarray类型）
index：相关的数据索引标签

1）Series的创建

两种创建方式：

(1) 由列表或numpy数组创建

默认索引为0到N-1的整数型索引

n = np.array([0, 2, 4, 6, 8])

# Series和ndarray差别，有没有具体的索引
s = Series(n)
s

0    0
1    2
2    4
3    6
4    8
dtype: int64

# Series包含ndarray
# Series功能就会强大，索引，检索方便很多
s.values

array([0, 2, 4, 6, 8])

array([0, 2, 4, 6, 8])



还可以通过设置index参数指定索引

s.index = list('abcde')
s

a    0
b    2
c    4
d    6
e    8
dtype: int64

s.index = ['张三','李四','Michael','sara','lisa']
s

张三         0
李四         2
Michael    4
sara       6
lisa       8
dtype: int64

特别地，由ndarray创建的是引用，而不是副本。对Series元素的改变也会改变原来的ndarray对象中的元素。（列表没有这种情况）

s['张三'] = 100
s

张三         100
李四           2
Michael      4
sara         6
lisa         8
dtype: int64

array([100,   2,   4,   6,   8])

(2) 由字典创建

# 字典的key就是索引
s = Series({'a': 1, 'b': 2, 'c': [1, 2, 3]})
s

a            1
b            2
c    [1, 2, 3]
dtype: object

s.values

array([1, 2, list([1, 2, 3])], dtype=object)

============================================

练习1：

使用多种方法创建以下Series，命名为s1：
语文 150
数学 150
英语 150
理综 300

============================================

s1 = Series([150, 150, 150, 300])
s1
s1.index = ['语文', '数学', '英语', '理宗']
s1

语文    150
数学    150
英语    150
理宗    300
dtype: int64

n = np.array([150, 150, 150, 300])
s1 = Series(n)
s1
s1.index = ['语文', '数学', '英语', '理宗']
s1

语文    150
数学    150
英语    150
理宗    300
dtype: int64

s1 = Series(data=[150]*3 + [100], index=['语文', '数学', '英语', '理宗'])
s1

语文    150
数学    150
英语    150
理宗    100
dtype: int64

s1 = Series({'语文': 150, '数学': 150, '英语': 150, '理宗': 300})
s1

语文    150
数学    150
英语    150
理宗    300
dtype: int64

2）Series的索引和切片

可以使用中括号取单个索引（此时返回的是元素类型），或者中括号里一个列表取多个索引（此时返回的仍然是一个Series类型）。分为显示索引和隐式索引：

(1) 显式索引：

- 使用index中的元素作为索引值
- 使用.loc[]（推荐）

注意，此时是闭区间

s = Series(data=[1, 2, 3, 4, 5], index=list('abcde'))
s

a    1
b    2
c    3
d    4
e    5
dtype: int64

# 直接使用中括号
s['a']

# 使用 .loc[]  location
s.loc['a']

# 切片，全闭区间
s['b': 'd']

b    2
c    3
d    4
dtype: int64

s.loc['b': 'd']

b    2
c    3
d    4
dtype: int64

(2) 隐式索引：

- 使用整数作为索引值
- 使用.iloc[]（推荐）

注意，此时是半开区间

s = Series(['panky', 'suki', 'snoopy', 'jake', 'disk'], index=[1, 2, 3, 4, 5])
s

1     panky
2      suki
3    snoopy
4      jake
5      disk
dtype: object

s[1]

'panky'

s.iloc[1]

'suki'

s[1: 3]

2      suki
3    snoopy
dtype: object

s.index

Int64Index([1, 2, 3, 4, 5], dtype='int64')

s.iloc[1: 3]

2      suki
3    snoopy
dtype: object

============================================

练习2：

使用多种方法对练习1创建的Series s1进行索引和切片：

索引：
数学 150

切片：
语文 150
数学 150
英语 150

============================================

s = Series(data={'语文': 150, '数学': 150, '英语': 150, '理宗': 300})
s

语文    150
数学    150
英语    150
理宗    300
dtype: int64

s['数学']

s.loc['数学']

s['语文': '英语']

语文    150
数学    150
英语    150
dtype: int64

s.loc['语文': '英语']

语文    150
数学    150
英语    150
dtype: int64

# 隐式 隐式左闭右开的区间
s.iloc[0: 3]

语文    150
数学    150
英语    150
dtype: int64

3）Series的基本概念

可以把Series看成一个定长的有序字典
可以通过shape，size，index,values等得到series的属性

s = Series(data=[145, 133, 123, 148, 150], index=['panky', 'suki', 'snoopy', 'jake', 'disk'])
s

panky     145
suki      133
snoopy    123
jake      148
disk      150
dtype: int64

s.shape

(5,)

s.size

s.values

array([145, 133, 123, 148, 150])

s.index

Index(['panky', 'suki', 'snoopy', 'jake', 'disk'], dtype='object')

可以通过head(),tail()快速查看Series对象的样式

s.head(2)

panky    145
suki     133
dtype: int64

s.tail(2)

jake    148
disk    150
dtype: int64

example_2

ph = pd.read_csv('./president_heights.csv')
ph

	order	name	height(cm)
0	1	George Washington	189
1	2	John Adams	170
2	3	Thomas Jefferson	189
3	4	James Madison	163
4	5	James Monroe	183
5	6	John Quincy Adams	171
6	7	Andrew Jackson	185
7	8	Martin Van Buren	168
8	9	William Henry Harrison	173
9	10	John Tyler	183
10	11	James K. Polk	173
11	12	Zachary Taylor	173
12	13	Millard Fillmore	175
13	14	Franklin Pierce	178
14	15	James Buchanan	183
15	16	Abraham Lincoln	193
16	17	Andrew Johnson	178
17	18	Ulysses S. Grant	173
18	19	Rutherford B. Hayes	174
19	20	James A. Garfield	183
20	21	Chester A. Arthur	183
21	23	Benjamin Harrison	168
22	25	William McKinley	170
23	26	Theodore Roosevelt	178
24	27	William Howard Taft	182
25	28	Woodrow Wilson	180
26	29	Warren G. Harding	183
27	30	Calvin Coolidge	178
28	31	Herbert Hoover	182
29	32	Franklin D. Roosevelt	188
30	33	Harry S. Truman	175
31	34	Dwight D. Eisenhower	179
32	35	John F. Kennedy	183
33	36	Lyndon B. Johnson	193
34	37	Richard Nixon	182
35	38	Gerald Ford	183
36	39	Jimmy Carter	177
37	40	Ronald Reagan	185
38	41	George H. W. Bush	188
39	42	Bill Clinton	188
40	43	George W. Bush	182
41	44	Barack Obama	185

s_name = ph['name']
type(s_name)

pandas.core.series.Series

s_name.head(2)

0    George Washington
1           John Adams
Name: name, dtype: object

s_name.tail(2)

40    George W. Bush
41      Barack Obama
Name: name, dtype: object

s_name.tail()

37        Ronald Reagan
38    George H. W. Bush
39         Bill Clinton
40       George W. Bush
41         Barack Obama
Name: name, dtype: object

当索引没有对应的值时，可能出现缺失数据显示NaN（not a number）的情况

s = Series(data=[1, 2, 3, None])
s

0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64

可以使用pd.isnull()，pd.notnull()，或自带isnull(),notnull()函数检测缺失数据

s = Series(['a', None, 'c', None, 'f'])
s

0       a
1    None
2       c
3    None
4       f
dtype: object

pd.isnull(s)

0    False
1     True
2    False
3     True
4    False
dtype: bool

pd.notnull(s)

0     True
1    False
2     True
3    False
4     True
dtype: bool

s.isnull()

0    False
1     True
2    False
3     True
4    False
dtype: bool

s.notnull()

0     True
1    False
2     True
3    False
4     True
dtype: bool

Series对象本身及其实例都有一个name属性

s = Series(data=[1, 2, 3, 4], name='a test')
s

0    1
1    2
2    3
3    4
Name: a test, dtype: int64

Series.name = 'Change'
s

0    1
1    2
2    3
3    4
Name: Change, dtype: int64

4）Series的运算

(1) 适用于numpy的数组运算也适用于Series

s = Series([1, 3, 5, 7], name='odd')
s

0    1
1    3
2    5
3    7
Name: odd, dtype: int64

s + 1

0    2
1    4
2    6
3    8
Name: odd, dtype: int64

(2) Series之间的运算

在运算中自动对齐不同索引的数据
如果索引不对应，则补NaN

s1 = Series([1, 3, 5, 7])
s1

0    1
1    3
2    5
3    7
dtype: int64

s2 = Series([2, 4, 6, 8])
s2

0    2
1    4
2    6
3    8
dtype: int64

s1 + s2

0     3
1     7
2    11
3    15
dtype: int64

Series之间的运算,是对应的index的元素进行运算,index不对应,补 NaN

s3 = Series([1, 2, 3, 4])
s4 = Series([1, 2, 3, 4], index=[2, 3, 4, 5])
s3 + s4

0    NaN
1    NaN
2    4.0
3    6.0
4    NaN
5    NaN
dtype: float64

注意：要想保留所有的index，则需要使用.add()函数

s3.add(s4, fill_value=0)

0    1.0
1    2.0
2    4.0
3    6.0
4    3.0
5    4.0
dtype: float64

============================================

练习3：

想一想Series运算和ndarray运算的规则有什么不同？
新建另一个索引包含“文综”的Series s2，并与s2进行多种算术操作。思考如何保存所有数据。

============================================

# Series没有广播机制
s1 = Series(data=[150] * 3 +[300], index=['语文', '数学', '英语', '理宗'])
s2 = Series(data=[150] * 3 +[300], index=['语文', '数学', '英语', '文综'])

s1 + s2

数学    300.0
文综      NaN
理宗      NaN
英语    300.0
语文    300.0
dtype: float64

s1.add(s2, fill_value=0)

数学    300.0
文综    300.0
理宗    300.0
英语    300.0
语文    300.0
dtype: float64

2、DataFrame

DataFrame是一个【表格型】的数据结构，可以看做是【由Series组成的字典】（共用同一个索引）。DataFrame由按一定顺序排列的多列数据组成。设计初衷是将Series的使用场景从一维拓展到多维。DataFrame既有行索引，也有列索引。

行索引：index
列索引：columns
值：values（numpy的二维数组）

1）DataFrame的创建

最常用的方法是传递一个字典来创建。DataFrame以字典的键作为每一【列】的名称，以字典的值（一个数组）作为每一列。

此外，DataFrame会自动加上每一行的索引（和Series一样）。

同Series一样，若传入的列与字典的键不匹配，则相应的值为NaN。

from pandas import DataFrame

df = DataFrame({'语文': np.random.randint(90, 150, size=4),
                '数学': np.random.randint(90, 150, size=4),
                '英语': np.random.randint(90, 150, size=4),
                'coding': np.random.randint(90, 150, size=4)
               })
df.index = ['panky', 'suki', 'lucy', 'loop']
df

	语文	数学	英语	coding
panky	109	128	141	138
suki	136	117	108	134
lucy	132	94	107	141
loop	116	117	112	90

index = ['panky', 'suki', 'lily', 'loop']
columns = ['Chinese', 'Math', 'English', 'Coding']
data = np.random.randint(90, 150, size=(4, 4))
df2 = DataFrame(index=index, columns=columns, data=data)
df2

	Chinese	Math	English	Coding
panky	130	97	142	103
suki	111	140	147	105
lily	134	110	132	128
loop	118	111	112	146

DataFrame属性：values、columns、index、shape

df.values

array([[109, 128, 141, 138],
       [136, 117, 108, 134],
       [132,  94, 107, 141],
       [116, 117, 112,  90]])

df.columns

Index(['语文', '数学', '英语', 'coding'], dtype='object')

df.index

Index(['panky', 'suki', 'lucy', 'loop'], dtype='object')

df.shape

(4, 4)

============================================

练习4：

根据以下考试成绩表，创建一个DataFrame，命名为df：

    张三  李四
语文 150  0
数学 150  0
英语 150  0
理综 300  0

============================================

data = [
    [150, 0],
    [150, 0],
    [150, 0],
    [300, 0]
]
columns = ['zhangsan', 'lisi']
index = ['语文', '数学', '英语', '理宗']
df1 = DataFrame(data=data, index= index, columns=columns)
df1

	zhangsan	lisi
语文	150	0
数学	150	0
英语	150	0
理宗	300	0

data = {'zhangsan': [150] * 3 + [300],  'lisi': [0] * 4}
index = ['语文', '数学', '英语', '理宗']
df2 = DataFrame(data=data, index=index)
df2

	zhangsan	lisi
语文	150	0
数学	150	0
英语	150	0
理宗	300	0

2）DataFrame的索引

(1) 对列进行索引

- 通过类似字典的方式
- 通过属性的方式

 # 行对于DataFrame而言，是样本，不是属性，不能通过.的方式进行调用

可以将DataFrame的列获取为一个Series。返回的Series拥有原DataFrame相同的索引，且name属性也已经设置好了，就是相应的列名。

index = ['panky', 'suki', 'lily', 'loop']
columns = ['Chinese', 'Math', 'English', 'Coding']
data = np.random.randint(90, 150, size=(4, 4))
df = DataFrame(index=index, columns=columns, data=data)
df

	Chinese	Math	English	Coding
panky	142	126	115	116
suki	140	135	91	140
lily	108	92	99	131
loop	112	126	102	142

# 类似字典的方式 dic['key']
df['Chinese']

panky    142
suki     140
lily     108
loop     112
Name: Chinese, dtype: int64

df[['Math', 'Coding']]  # 两个中括号,还是返回DataFrame

	Math	Coding
panky	126	116
suki	135	140
lily	92	131
loop	126	142

df.Chinese

panky    142
suki     140
lily     108
loop     112
Name: Chinese, dtype: int64

[df.Math, df.Coding]

[panky    126
 suki     135
 lily      92
 loop     126
 Name: Math, dtype: int64, panky    116
 suki     140
 lily     131
 loop     142
 Name: Coding, dtype: int64]

(2) 对行进行索引

- 使用.loc[]加index来进行行索引
- 使用.iloc[]加整数来进行行索引

同样返回一个Series，index为原来的columns。

df

	Chinese	Math	English	Coding
panky	142	126	115	116
suki	140	135	91	140
lily	108	92	99	131
loop	112	126	102	142

df.loc['suki']  # 显式索引

Chinese    140
Math       135
English     91
Coding     140
Name: suki, dtype: int64

df.loc[['loop']]

	Chinese	Math	English	Coding
loop	112	126	102	142

df.iloc[0]  # 隐式索引

Chinese    142
Math       126
English    115
Coding     116
Name: panky, dtype: int64

df.iloc[[0]]

	Chinese	Math	English	Coding
panky	142	126	115	116

df.iloc[0: 2]

	Chinese	Math	English	Coding
panky	142	126	115	116
suki	140	135	91	140

df.loc['panky': 'suki']

	Chinese	Math	English	Coding
panky	142	126	115	116
suki	140	135	91	140

(3) 对元素索引的方法
- 使用列索引
- 使用行索引(iloc[3,1]相当于两个参数;iloc[[3,3]] 里面的[3,3]看做一个参数)
- 使用values属性（二维numpy数组）

df

	Chinese	Math	English	Coding
panky	142	126	115	116
suki	140	135	91	140
lily	108	92	99	131
loop	112	126	102	142

# 先找列再找行
df['English']['suki']

df['English'].loc['suki']

df['English', 'panky']  # 先找列的时候,不能写到一个中括号中

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


KeyError: ('English', 'panky')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)

<ipython-input-128-5668b58e37e0> in <module>
----> 1 df['English', 'panky']  # 先找列的时候,不能写到一个中括号中


~/.local/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2978             if self.columns.nlevels > 1:
   2979                 return self._getitem_multilevel(key)
-> 2980             indexer = self.columns.get_loc(key)
   2981             if is_integer(indexer):
   2982                 indexer = [indexer]


~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


KeyError: ('English', 'panky')

# 先行后列
df.loc['panky']['Math']

df.loc['panky'].loc['Math']

【注意】
直接用中括号时：

索引表示的是列索引
切片表示的是行切片

df['Math']

panky    126
suki     135
lily      92
loop     126
Name: Math, dtype: int64

df['suki': 'lily']

	Chinese	Math	English	Coding
suki	140	135	91	140
lily	108	92	99	131

df.iloc[1: 3]

	Chinese	Math	English	Coding
suki	140	135	91	140
lily	108	92	99	131

============================================

练习5：

使用多种方法对ddd进行索引和切片，并比较其中的区别

============================================

ddd = DataFrame(data={'张三': [150] * 3 + [300],
                      '李四': [0] * 4
                     }, index=['Chinese', 'Math', 'English', 'Coding'])
ddd

	张三	李四
Chinese	150	0
Math	150	0
English	150	0
Coding	300	0

1, 取出张三的语文成绩,要求返回的还是一个DataFrame

type(ddd)

pandas.core.frame.DataFrame

res = ddd['张三'].loc['Chinese']
res

type(res)

numpy.int64

res = ddd[0:1]
res

	张三	李四
Chinese	150	0

res = ddd.loc[['Chinese'], ['张三']]
res

	张三
Chinese	150

type(res)

pandas.core.frame.DataFrame

2, 进行行切片,切出数学到编程的数据

ddd['Math': 'Coding']

	张三	李四
Math	150	0
English	150	0
Coding	300	0

ddd.loc['Math': 'Coding']

	张三	李四
Math	150	0
English	150	0
Coding	300	0

ddd.iloc[1: 4]

	张三	李四
Math	150	0
English	150	0
Coding	300	0

3, 对李四的英语成绩赋值, 成绩是88.8分.

ddd['李四'].loc['English'] = 88.8

ddd

	张三	李四
Chinese	150	0.0
Math	150	0.0
English	150	88.8
Coding	300	0.0

ddd.loc['Coding', '李四'] = 135.6

ddd

	张三	李四
Chinese	150	0.0
Math	150	0.0
English	150	88.8
Coding	300	135.6

4, 增加一列王五

ddd['王五'] = np.random.randint(0, 150, 4)
ddd

	张三	李四	王五
Chinese	150	0.0	5
Math	150	0.0	117
English	150	88.8	71
Coding	300	135.6	25

5, 新增加一行

ddd.loc['python'] = np.random.randint(0, 150, size=3)
ddd

	张三	李四	王五
Chinese	150	0.0	5
Math	150	0.0	117
English	150	88.8	71
Coding	300	135.6	25
python	8	34.0	126

3）DataFrame的运算

（1） DataFrame之间的运算

同Series一样：

在运算中自动对齐不同索引的数据
如果索引不对应，则补NaN

创建DataFrame df1 不同人员的各科目成绩，月考一

df1 = DataFrame({'Python': [122, 133, 140], 'Math': [139, 118, 112],
                 'English': [133, 128, 110]
                }, index=['A', 'B', 'C'])
df1

	Python	Math	English
A	122	139	133
B	133	118	128
C	140	112	110

创建DataFrame df2 不同人员的各科目成绩，月考二
有新学生转入

df2 = DataFrame(data=np.random.randint(0, 150, size=(4, 4)),
                index=['A', 'B', 'C', 'D'],
                columns=['Python', 'Math', 'Physical', 'English']
               )
df2

	Python	Math	Physical	English
A	104	49	8	71
B	82	123	34	113
C	26	146	3	123
D	55	81	47	23

df1

	Python	Math	English
A	122	139	133
B	133	118	128
C	140	112	110

# + 生成一个新的DataFrame
df1 + 10

	Python	Math	English
A	132	149	143
B	143	128	138
C	150	122	120

df1 + df2

	English	Math	Physical	Python
A	204.0	188.0	NaN	226.0
B	241.0	241.0	NaN	215.0
C	233.0	258.0	NaN	166.0
D	NaN	NaN	NaN	NaN

下面是Python 操作符与pandas操作函数的对应表：

Python Operator	Pandas Method(s)
`+`	`add()`
`-`	`sub()`, `subtract()`
`*`	`mul()`, `multiply()`
`/`	`truediv()`, `div()`, `divide()`
`//`	`floordiv()`
`%`	`mod()`
`**`	`pow()`

display(df1, df2)

	Python	Math	English
A	122	139	133
B	133	118	128
C	140	112	110

	Python	Math	Physical	English
A	104	49	8	71
B	82	123	34	113
C	26	146	3	123
D	55	81	47	23

df = df1.add(df2, fill_value=0)/2
df

	English	Math	Physical	Python
A	102.0	94.0	4.0	113.0
B	120.5	120.5	17.0	107.5
C	116.5	129.0	1.5	83.0
D	11.5	40.5	23.5	27.5

（2） Series与DataFrame之间的运算

【重要】

使用Python操作符：以行为单位操作（参数必须是行），对所有行都有效。（类似于numpy中二维数组与一维数组的运算，但可能出现NaN）
使用pandas操作函数：
- axis=0：以列为单位操作（参数必须是列），对所有列都有效。
- axis=1：以行为单位操作（参数必须是行），对所有行都有效。
- Whether to compare by the index (0 or 'index') or columns(1 or 'columns'). For Series input, axis to match Series index on.

s = Series([69.0, 130.0, 70.5, 85.5, 10.0], index=['Python', 'Math', 'Physical', 'English', 'Golang'])
s

Python       69.0
Math        130.0
Physical     70.5
English      85.5
Golang       10.0
dtype: float64

data = np.random.randint(90, 150, size=(5, 4))
columns = ['Python', 'Math', 'Physical', 'English']
index = ['Michael', 'San', 'Si', 'Wu', 'Fly']
df = DataFrame(data=data, columns=columns, index=index)
df

	Python	Math	Physical	English
Michael	94	126	102	134
San	124	98	98	128
Si	119	139	130	115
Wu	136	93	99	121
Fly	135	121	147	125

Python       69.0
Math        130.0
Physical     70.5
English      85.5
Golang       10.0
dtype: float64

# 以行为单位，操作每一行，加s
df.add(s, axis=1)

	English	Golang	Math	Physical	Python
Michael	219.5	NaN	256.0	172.5	163.0
San	213.5	NaN	228.0	168.5	193.0
Si	200.5	NaN	269.0	200.5	188.0
Wu	206.5	NaN	223.0	169.5	205.0
Fly	210.5	NaN	251.0	217.5	204.0

df.add(s, axis=0)

	Python	Math	Physical	English
English	NaN	NaN	NaN	NaN
Fly	NaN	NaN	NaN	NaN
Golang	NaN	NaN	NaN	NaN
Math	NaN	NaN	NaN	NaN
Michael	NaN	NaN	NaN	NaN
Physical	NaN	NaN	NaN	NaN
Python	NaN	NaN	NaN	NaN
San	NaN	NaN	NaN	NaN
Si	NaN	NaN	NaN	NaN
Wu	NaN	NaN	NaN	NaN

s2 = Series([98, 120, 130, 120],
            index=['Michael', 'San', 'Si', 'Wu'], name='Python')
s2

Michael     98
San        120
Si         130
Wu         120
Name: Python, dtype: int64

df

	Python	Math	Physical	English
Michael	94	126	102	134
San	124	98	98	128
Si	119	139	130	115
Wu	136	93	99	121
Fly	135	121	147	125

df.add(s2, axis=0)

	Python	Math	Physical	English
Fly	NaN	NaN	NaN	NaN
Michael	192.0	224.0	200.0	232.0
San	244.0	218.0	218.0	248.0
Si	249.0	269.0	260.0	245.0
Wu	256.0	213.0	219.0	241.0

s3 = df['Python']
s3

Michael     94
San        124
Si         119
Wu         136
Fly        135
Name: Python, dtype: int64

# s3 和 df 进行操作
# 结论: DataFrame 和Series运算 的时候,默认 是对比列索引.列索引一致才能运算.否则补NaN.
df + s3

	English	Fly	Math	Michael	Physical	Python	San	Si	Wu
Michael	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
San	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Si	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Wu	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Fly	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

============================================

练习6：

假设ddd是期中考试成绩，ddd2是期末考试成绩，请自由创建ddd2，并将其与ddd相加，求期中期末平均值。
假设cat期中考试数学被发现作弊，要记为0分，如何实现？
dog因为举报张三作弊立功，期中考试所有科目加100分，如何实现？
后来老师发现有一道题出错了，为了安抚学生情绪，给每位学生每个科目都加10分，如何实现？

============================================

data = np.random.randint(0, 150, size=(6, 4))
columns = ['Chinese', 'Math', 'English', 'Python']
index = ['add', 'bit', 'cat', 'dog', 'egg', 'golf']
ddd = DataFrame(data=data, columns=columns, index=index)
ddd

	Chinese	Math	English	Python
add	76	39	59	148
bit	75	41	48	4
cat	49	77	0	148
dog	32	22	95	71
egg	93	38	116	68
golf	33	47	94	39

data = np.random.randint(0, 150, size=(6, 4))
columns = ['Chinese', 'Math', 'English', 'Python']
index = ['add', 'bit', 'cat', 'dog', 'egg', 'golf']
ddd2 = DataFrame(data=data, columns=columns, index=index)
ddd2

	Chinese	Math	English	Python
add	90	100	87	64
bit	122	35	120	145
cat	117	75	61	19
dog	36	137	49	80
egg	84	114	96	57
golf	75	27	58	49

(ddd + ddd2) / 2

	Chinese	Math	English	Python
add	83.0	69.5	73.0	106.0
bit	98.5	38.0	84.0	74.5
cat	83.0	76.0	30.5	83.5
dog	34.0	79.5	72.0	75.5
egg	88.5	76.0	106.0	62.5
golf	54.0	37.0	76.0	44.0

ddd.loc['cat'] = 0
ddd

	Chinese	Math	English	Python
add	76	39	59	148
bit	75	41	48	4
cat	0	0	0	0
dog	32	22	95	71
egg	93	38	116	68
golf	33	47	94	39

ddd.loc['dog']

Chinese    32
Math       22
English    95
Python     71
Name: dog, dtype: int64

ddd.loc['dog'] += 100

ddd

	Chinese	Math	English	Python
add	76	39	59	148
bit	75	41	48	4
cat	0	0	0	0
dog	132	122	195	171
egg	93	38	116	68
golf	33	47	94	39

ddd

	Chinese	Math	English	Python
add	76	39	59	148
bit	75	41	48	4
cat	0	0	0	0
dog	132	122	195	171
egg	93	38	116	68
golf	33	47	94	39

ddd += 10

ddd

	Chinese	Math	English	Python
add	86	49	69	158
bit	85	51	58	14
cat	10	10	10	10
dog	142	132	205	181
egg	103	48	126	78
golf	43	57	104	49

转载于:https://www.cnblogs.com/pankypan/p/11567937.html

weixin_30376323

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
04_Pandas的数据结构

Pandas的数据结构导入pandas：三剑客import numpy as npimport pandas as pdimport matplotlib.pyplot as plt%matplotlib inlinefrom pandas import Series, DataFrame1、SeriesSeries是一种类似于一维数组的对象，由下面两个部分组成：...
复制链接

扫一扫