Slicing DataFrames
在本节中提到了在slice过程中何时生成series何时生成DataFrame的问题,解决了之前留下的疑惑:
此外,在slice的过程中,我们可以反向slice,to do this for hypothetical row labels 'a'
and 'b'
, you could use a stepsize of -1
like so: df.loc['b':'a':-1]:
# Slice the row labels 'Perry' to 'Potter': p_counties
p_counties = election.loc['Perry':'Potter',:]
# Print the p_counties DataFrame
print(p_counties)
# Slice the row labels 'Potter' to 'Perry' in reverse order: p_counties_rev
p_counties_rev = election.loc['Potter':'Perry':-1]
# Print the p_counties_rev DataFrame
print(p_counties_rev)
此外还有从头选到某列或者从某列选到尾的命令写法,这都是之前练习很难注意到的地方:
# Slice the columns from the starting column to 'Obama': left_columns
left_columns = election.loc[:, :'Obama']
# Print the output of left_columns.head()
print(left_columns.head())
# Slice the columns from 'Obama' to 'winner': middle_columns
middle_columns = election.loc[:, 'Obama':'winner']
# Print the output of middle_columns.head()
print(middle_columns.head())
# Slice the columns from 'Romney' to the end: 'right_columns'
right_columns = election.loc[:, 'Romney':]
# Print the output of right_columns.head()
print(right_columns.head())
下面又提到了DataFrame的数据问题,比如0和NaN数据,如何寻找、定位、改变。这道题中就将过于接近的数据变为NaN以方便以后的处理:
# Import numpy
import numpy as np
# Create the boolean array: too_close
too_close = election.margin < 1
# Assign np.nan to the 'winner' column where the results were too close to call
election.loc[too_close, 'winner'] = np.nan
# Print the output of election.info()
print(election.info())
Transforming DataFrames
在这一节中提到了一个命令:“.floordiv()”,也就是地板除,也可以写成“np.floor_divide(文件, 数值)”,使得我们可以对整个文件进行处理,其中第一个指令是pandas的,而第二个指令是numpy的。
Setting & sorting a MultiIndex
在很多的数据列中都会出现重复的数据,比如很多数据都属于同一个日期,而为了让数据更具备可读性,我们可以将这样的可分类列定位index,然后对其进行sort操作:
# Set the index to be the columns ['state', 'month']: sales
sales = sales.set_index(['state', 'month'])
# Sort the MultiIndex: sales
sales = sales.sort_index()
# Print the sales DataFrame
print(sales)
原数据:
set_index之后的数据:
对index进行sort之后的数据:
Pivoting a single variable
我们同样可以使用pivot命令更改DataFrame的结构,我们可以定义index、columns和value来使用某一列数据填充整个DataFrame,比如 Pivot the users
DataFrame with the rows indexed by 'weekday'
, the columns indexed by 'city'
, and the values populated with 'visitors':
# Pivot the users DataFrame: visitors_pivot
visitors_pivot = users.pivot(index='weekday', columns='city', values='visitors')
# Print the pivoted DataFrame
print(visitors_pivot)
下面这两张图便于我们理解stacking和unstacking的作用: