Manipulating DataFrames with pandas(datacamp)

本文介绍了如何使用pandas库对DataFrame进行数据提取和转换。包括通过iloc和loc进行索引,筛选数据,以及apply、applymap和map的转换方法。此外,还涉及了高级索引、分层索引、数据重塑(如堆叠、展开和融化DataFrame)以及groupby操作,如分组聚合和转换。
摘要由CSDN通过智能技术生成
  • Extracting and transforming data

    1索引 DataFrames

    iloc,即index locate 用index索引进行定位,
    loc,则可以使用column名和index名进行定位,

    df.loc[rowname,colname]
    df.iloc[num,num]
    df[colname]			#Series
    df[[colname]]		#DataFrame
    
    #sample
    # Print the boolean equivalence
    print(election.iloc[4, 4] == election.loc['Bedford', 'winner'])
    # Slice the columns from the starting column to 'Obama': left_columns
    left_columns = election.loc[:, :'Obama']
    
    # Create a separate dataframe with the columns ['winner', 'total', 'voters']: results
    results = election[['winner', 'total', 'voters']]
    
    
    

    2.筛选

    df[condition]
    
    #sample
    # Create the boolean array: 
    condition = df['a'] > 70
    
    # Filter the df DataFrame with the condition array: 
    df_con = df[condition]
    

    3.转换DataFrame

    *apply:用在dataframe上,用于对row或者column进行计算;

    *applymap:用于dataframe上,是元素级别的操作;

    *map:(python自带)用于series上,是元素级别的操作。

    #sample
    # Write a function to convert degrees Fahrenheit to degrees Celsius: 
    def to_celsius(F):
        return 5/9*(F - 32)
    
    # Apply the function over 'Mean TemperatureF' and 'Mean Dew PointF': 
    df_celsius = weather[['Mean TemperatureF','Mean Dew PointF']].apply(to_celsius)
    
    # Reassign the column labels of df_celsius
    df_celsius.columns = ['Mean TemperatureC', 'Mean Dew PointC']
    
    # Print the output of df_celsius.head()
    print(df_celsius.head())
    
  • Advanced indexing

    1.index

    series, DataFrame

    immutable(like dictionary keys)

    homogenous in data type

    pd.read_csv(filename, index_col=___)
    df.index = ___
    df.index.name=___
    df.columns.name=___
    
    #sample
    # Generate the list of months: months
    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
    
    # Assign months to sales.index
    sales.index = months
    

    2.Hierarchical indexing

    df = df.set_index([___,___,...])
    df = df.sort_index()
    df.index.names
    
    #sample
    print(sales.loc[['CA', 'TX']])
    
    print(sales.loc['CA':'TX'])
    
    # Access the inner month index and look up data for all states in month 2: 
    all_month2 = sales.loc[(slice(None),2),:]
    
  • Rearranging and reshaping data

    1.Pivoting DataFrame

    df_pivot = df.pivot(index='___', columns='___', values='___')
    

    2.Stacking & unstacking

    df_pivot.unstack(level=___)				#sub index
    df_piv_uns.stack(level=___)				#add index
    df_sw = df_piv_uns,swaplevel(___)		#swap level
    df_sorted = df_sw.sort_index()
    
    #sample
    byweekday = df.unstack(level='weekday')
    print(byweekday.stack(level='weekday'))
    

    3.melting DataFrame

    pd.melt(filename, id_vars=[], value_vars=[],var_name=___, value_name=___)
    
    1. picot table

      index contains duplicate entries

      df_pt = dfpivot_table(index=___, columns=___, values=___, aggfunc=___)
      
  • Grouping data

    1.categoricals and groupby

    df.groupby('')
    

    2.groupby and aggregation

    agg 调用的时候要指定字段,apply 默认传入的是整个dataframe

    df.groupby(colname).agg([___,___,...])
    
    #sample
    df.groupby('a')[['b', 'c']].agg({'b':'sum','c':data_range})
    
    aggregated = titanic.groupby('pclass')[['age','fare']]agg(['max', 'median'])
    print(aggregated.loc[:, ('age','max')])
    

    3.groupby and transformation

    transform 是针对输入的元素级别转换,t同一时间只允许在一个Series上转换

    返回与传入数据相同的行

    df.groupby(___).transform(___)
    

    4.groupby and filtering

    df.groupby(___).filter(___)
    
    #sample
    # Filter 'Units' where the sum is > 35: by_com_filt
    df = df.groupby('Company').filter(lambda g:g['Units'].sum() > 35)
    
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值