后续补充:
遍历DataFrame的三种方法:
- iteritem()方法返回一个<class ‘method’>数据,可利用for循环获得输出
- iterrow()方法返回一个<class ‘generator’>数据,可利用for循环获得输出
- itertuple()方法返回一个<class ‘pandas.core.frame.Pandas’>数据,可利用getattr(row,‘列索引’)方法获得对应数据
演示数据准备:
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],
'year':[2000,2001,2002,2003,2004,2005],
'pop':[1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(data)
按列遍历:
column_indexs = []
for column_index, row_data in frame.iteritems():
column_indexs.append(column_index)
print(row_data)
print(column_indexs)
运行结果:
0 Ohio
1 Ohio
2 Ohio
3 Nevada
4 Nevada
5 Nevada
Name: state, dtype: object
0 2000
1 2001
2 2002
3 2003
4 2004
5 2005
Name: year, dtype: int64
0 1.5
1 1.7
2 3.6
3 2.4
4 2.9
5 3.2
Name: pop, dtype: float64
['state', 'year', 'pop']
按行遍历
One:
row_indexs = []
for index, row in frame.iterrows():
row_indexs.append(index)
print(row)
print(row_indexs)
运行结果:
state Ohio
year 2000
pop 1.5
Name: 0, dtype: object
state Ohio
year 2001
pop 1.7
Name: 1, dtype: object
state Ohio
year 2002
pop 3.6
Name: 2, dtype: object
state Nevada
year 2003
pop 2.4
Name: 3, dtype: object
state Nevada
year 2004
pop 2.9
Name: 4, dtype: object
state Nevada
year 2005
pop 3.2
Name: 5, dtype: object
[0, 1, 2, 3, 4, 5]
还可以通过列名获取对应数据
for index, row in frame.iterrows():
print(row['pop'])
运行结果:
1.5
1.7
3.6
2.4
2.9
3.2
two:
for row in frame.itertuples():
print(getattr(row, 'state'), getattr(row, 'year'), getattr(row, 'pop'))
print(type(row))
运行结果:
Ohio 2000 1.5
<class 'pandas.core.frame.Pandas'>
......
Nevada 2005 3.2
<class 'pandas.core.frame.Pandas'>
遍历DataFrame某一列(行)数据
演示数据准备:
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada','Nevada'],
'year':[2000,2001,2002,2003,2004,2005],
'pop':[1.5,1.7,3.6,2.4,2.9,3.2]}
frame = pd.DataFrame(data)
获取frame的index属性,然后使用frame[列索引].get(行索引)获得对应的值:
print(frame.columns)
for index in frame.index:
print(frame['state'].get(index))
运行结果:
Index(['state', 'year', 'pop'], dtype='object')
Ohio
Ohio
Ohio
Nevada
Nevada
Nevada
与上面等价的两种写法:
# 第一种
for index in frame.index:
print(frame['state'][index])
# 第二种
for index in frame.index:
print(frame.get('state').get(index))
获取frame的column属性,然后使用frame[列索引].get(行索引)获得对应的值:
print(frame.index)
for column in frame.columns:
print(frame[column].get(0))
运行结果:
RangeIndex(start=0, stop=6, step=1)
Ohio
2000
1.5
获取某一个值
- DataFrame.at[行索引,列索引]获取某一个值:
print(frame.at[1,'pop'])
运行结果:
1.7
- DataFrame.iat[默认行索引,默认列索引]获取某一个值:
print(frame.iat[1,1])
运行结果:
1.7
- DataFrame.loc[行索引,列索引]获取某个值,与at不同的是,只输入某一参数,获得某一行或某一列:
print(frame.loc[1,'pop'])
print(frame.loc[:,'pop'])
print(frame.loc[[1,2]])
print(frame.loc[:,['state', 'pop']])
运行结果:
1.7
0 1.5
1 1.7
2 3.6
3 2.4
4 2.9
5 3.2
Name: pop, dtype: float64
state year pop
1 Ohio 2001 1.7
2 Ohio 2002 3.6
state pop
0 Ohio 1.5
1 Ohio 1.7
2 Ohio 3.6
3 Nevada 2.4
4 Nevada 2.9
5 Nevada 3.2
- DataFrame.iloc[默认行索引,默认列索引]获取某个值,与iat不同的是,只输入某一参数,获得某一行或某一列:
1.7
0 1.5
1 1.7
2 3.6
3 2.4
4 2.9
5 3.2
Name: pop, dtype: float64
state year pop
1 Ohio 2001 1.7
2 Ohio 2002 3.6
year pop
0 2000 1.5
1 2001 1.7
2 2002 3.6
3 2003 2.4
4 2004 2.9
5 2005 3.2