-
Extracting and transforming data
1索引 DataFrames
iloc,即index locate 用index索引进行定位,
loc,则可以使用column名和index名进行定位,df.loc[rowname,colname] df.iloc[num,num] df[colname] #Series df[[colname]] #DataFrame #sample # Print the boolean equivalence print(election.iloc[4, 4] == election.loc['Bedford', 'winner']) # Slice the columns from the starting column to 'Obama': left_columns left_columns = election.loc[:, :'Obama'] # Create a separate dataframe with the columns ['winner', 'total', 'voters']: results results = election[['winner', 'total', 'voters']]
2.筛选
df[condition] #sample # Create the boolean array: condition = df['a'] > 70 # Filter the df DataFrame with the condition array: df_con = df[condition]
3.转换DataFrame
*apply:用在dataframe上,用于对row或者column进行计算;
*applymap:用于dataframe上,是元素级别的操作;
*map:(python自带)用于series上,是元素级别的操作。
#sample # Write a function to convert degrees Fahrenheit to degrees Celsius: def to_celsius(F): return 5/9*(F - 32) # Apply the function over 'Mean TemperatureF' and 'Mean Dew PointF': df_celsius = weather[['Mean TemperatureF','Mean Dew PointF']].apply(to_celsius) # Reassign the column labels of df_celsius df_celsius.columns = ['Mean TemperatureC', 'Mean Dew PointC'] # Print the output of df_celsius.head() print(df_celsius.head())
-
Advanced indexing
1.index
series, DataFrame
immutable(like dictionary keys)
homogenous in data type
pd.read_csv(filename, index_col=___) df.index = ___ df.index.name=___ df.columns.name=___ #sample # Generate the list of months: months months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'] # Assign months to sales.index sales.index = months
2.Hierarchical indexing
df = df.set_index([___,___,...]) df = df.sort_index() df.index.names #sample print(sales.loc[['CA', 'TX']]) print(sales.loc['CA':'TX']) # Access the inner month index and look up data for all states in month 2: all_month2 = sales.loc[(slice(None),2),:]
-
Rearranging and reshaping data
1.Pivoting DataFrame
df_pivot = df.pivot(index='___', columns='___', values='___')
2.Stacking & unstacking
df_pivot.unstack(level=___) #sub index df_piv_uns.stack(level=___) #add index df_sw = df_piv_uns,swaplevel(___) #swap level df_sorted = df_sw.sort_index() #sample byweekday = df.unstack(level='weekday') print(byweekday.stack(level='weekday'))
3.melting DataFrame
pd.melt(filename, id_vars=[], value_vars=[],var_name=___, value_name=___)
-
picot table
index contains duplicate entries
df_pt = dfpivot_table(index=___, columns=___, values=___, aggfunc=___)
-
-
Grouping data
1.categoricals and groupby
df.groupby('')
2.groupby and aggregation
agg 调用的时候要指定字段,apply 默认传入的是整个dataframe
df.groupby(colname).agg([___,___,...]) #sample df.groupby('a')[['b', 'c']].agg({'b':'sum','c':data_range}) aggregated = titanic.groupby('pclass')[['age','fare']]agg(['max', 'median']) print(aggregated.loc[:, ('age','max')])
3.groupby and transformation
transform 是针对输入的元素级别转换,t同一时间只允许在一个Series上转换
返回与传入数据相同的行
df.groupby(___).transform(___)
4.groupby and filtering
df.groupby(___).filter(___) #sample # Filter 'Units' where the sum is > 35: by_com_filt df = df.groupby('Company').filter(lambda g:g['Units'].sum() > 35)
Manipulating DataFrames with pandas(datacamp)
![](https://img-home.csdnimg.cn/images/20240711042549.png)