pandas简易教程

Pandas Tools

  1. xlrd
  2. openpyxl
  3. tabulate
  4. pandas_profiling

File reading and writing functions

Read function

pandas could read many kinds of file,such as csv, excel, txt.

import pandas as pd
df_csv = pd.read_csv('file.csv')
df_table = pd.read_table('file.txt') 
df_excel = pd.read_excel('file.xlsx')
Shared parameters
  1. header: str = "infer" denotes that the first line is not col name.
  2. index_col denotes that it makes some cols as index.
  3. usecols is the set of cols which we will read.
  4. parse_dates indicates that the cols would be read as time.
  5. nrows is the num of rows read.
Private paramters
pd.read_table()

sep is the Separator with regular expression.
We should use the parameter engine='python' at the same time.

Writing function

df.to_csv('file.csv', sep='\t')
df.to_excel('file.xlsx')
df.to_markdown('file.md')
df.to_latex()

index=False removes indices.
The function df.to_csv() have a sep parameter to select a separator.
The functions df.to_markdown() and df.to_latex() need a pre-package tabulate

Series and Dataframe

Series

pd.Series is a kind of data structure consisted of four parts

  1. data, the values of series
  2. index, the index (with a name or without)
  3. dtype, the date type
  4. name, the name of series

Dataframe

Dataframe is a table.
DataFrame has added column indexes on top of Series, i.e.columns.

Normal Attributes

df.values # return all value
df.index # return all index
df.columns # return all column indexs
df.dtypes # return all dtypes corresponding 
df.shape # return the shape
df.T # return the transposition of df

Normal function

df.set_index(cols)

Aggregation function

Sampling

df.head(n) # take the previous n rows
df.tail(n) # take the last n rows

Global features

df.info() # information overview
df.describe() # main statistics

More, we need the package pandas-profiling.

Statistics function

  1. sum, mean
  2. median, quantile
  3. var, std
  4. max, min, idxmax, idmin
  5. count

along parameter axis

Unique function

df[cols].unique() # unique cols
df[cols].nunique() # num of unique cols
df[cols].value_counts()
df[cols].drop_duplicates(keep='first'|'last'|'false') # remove duplicates
df[cols].duplicated(keep='first'|'last'|'false') # bool sequence

drop_duplicates removes element according to duplicated.

Replace function

s.replace(to_replace: list, value: list)
s.replace(d: dict[to_replace, value])

s.where(bs: bool series, value) # replace false
s.mask(bs: bool series, value) # replace true

s.round()
s.abs()
s.clip(inf: num, sup: num) # replace lower and higher with inf or sup

Sort function

df.sort_index(indexs, ascending: bool series)
df.sort_values(cols, ascending: bool series = bs)

apply function

df.apply(func)
df.apply(lambda)

input: every column one by one (pd.Series)
output: a series as a column

Windows function

s.rolling(window: int = len) 
s.expanding()
s.ewm(alpha: num = coefficient)
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值