pandas基础

一、文件的读取和写入

1.文件读取

pd.read_csv(’ ‘)
pd.read_excel(’ ‘)
pd.read_table(’ ')
注:header=None表示第一行不作为列名,index_col表示把某一列或几列作为索引,索引的内容将会在第三章进行详述,usecols表示读取列的集合,默认读取所有的列,parse_dates表示需要转化为时间的列,关于时间序列的有关内容将在第十章讲解,nrows表示读取的数据行数。上面这些参数在上述的三个函数里都可以使用。

2.数据写入

df_csv.to_csv(’…/data/my_csv_saved.csv’, index=False)
df_excel.to_excel(’…/data/my_excel_saved.xlsx’, index=False)
df_txt.to_csv(’…/data/my_txt_saved.txt’, sep=’\t’, index=False)

二、基本数据结构

Series

Series一般由四个部分组成,分别是序列的值data、索引index、存储类型dtype、序列的名字name。其中,索引也可以指定它的名字,默认为空。

DataFrame

DataFrame在Series的基础上增加了列索引,一个数据框可以由二维的data与行列索引来构造

三、常用基本函数

1. 汇总函数

head, tail函数分别表示返回表或者序列的前n行和后n行,其中n默认为5
info, describe分别返回表的信息概况和表中数值列对应的主要统计量

2.特征统计函数

在Series和DataFrame上定义了许多统计函数,最常见的是sum, mean, median, var, std, max, min

3.唯一值函数

对序列使用unique和nunique可以分别得到其唯一值组成的列表和唯一值的个数
value_counts可以得到唯一值和其对应出现的频数
如果想要观察多个列组合的唯一值,可以使用drop_duplicates。其中的关键参数是keep,默认值first表示每个组合保留第一次出现的所在行,last表示保留最后一次出现的所在行,False表示把所有重复组合所在的行剔除。此外,duplicated和drop_duplicates的功能类似,但前者返回了是否为唯一值的布尔列表,其keep参数与后者一致。其返回的序列,把重复元素设为True,否则为False。 drop_duplicates等价于把duplicated为True的对应行剔除。

4.替换函数

pandas中的替换函数可以归纳为三类:映射替换、逻辑替换、数值替换。在replace中,可以通过字典构造,或者传入两个列表来进行替换。另外,replace还有一种特殊的方向替换,指定method参数为ffill则为用前面一个最近的未被替换的值进行替换,bfill则使用后面最近的未被替换的值进行替换。

5.排序函数

排序共有两种方式,其一为值排序,其二为索引排序,对应的函数是sort_values和sort_index。

6.apply方法

apply方法常用于DataFrame的行迭代或者列迭代,它的axis含义与第2小节中的统计聚合函数一致,apply的参数往往是一个以序列为输入的函数

四、窗口对象

pandas中有3类窗口,分别是滑动窗口rolling、扩张窗口expanding以及指数加权窗口ewm
1.滑窗对象
2.扩张窗口

练习

Ex1:口袋妖怪数据集

现有一份口袋妖怪的数据集,下面进行一些背景说明:
1)#代表全国图鉴编号,不同行存在相同数字则表示为该妖怪的不同状态
2)妖怪具有单属性和双属性两种,对于单属性的妖怪,Type 2为缺失值
3)Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed分别代表种族值、体力、物攻、防御、特攻、特防、速度,其中种族值为后6项之和

1.对HP, Attack, Defense, Sp. Atk, Sp. Def, Speed进行加总,验证是否为Total值

df = pd.read_csv('pokemon.csv')
df.head(3)
list_cols = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
for x in list_cols:
    df['sum'] = df['sum'] + df[x]
if df[df['sum']!=df['Total']].shape[0] > 0:
    print('存在错误!', df[df['sum']!=df['Total']])
else:
    print('暂无错误!')

2.对于#重复的妖怪只保留第一条记录,解决以下问题:
a)求第一属性的种类数量和前三多数量对应的种类
b)求第一属性和第二属性的组合种类
c)求尚未出现过的属性组合

print(df.columns)
df = df.drop_duplicates('#', keep='first')

Index([’#’, ‘Name’, ‘Type 1’, ‘Type 2’, ‘Total’, ‘HP’, ‘Attack’, ‘Defense’,‘Sp. Atk’, ‘Sp. Def’, ‘Speed’],dtype=‘object’)

print('第一属性的种类数量:', set(df['Type 1']).__len__())
print(df['Type 1'].value_counts()[:3])

第一属性的种类数量: 18
Water 105
Normal 93
Grass 66

df_2 = df.drop_duplicates(['Type 1', 'Type 2'])
print('第一属性和第二属性的组合种类:', df_2.shape[0])

第一属性和第二属性的组合种类:143

type1_name_set = set(df['Type 1'])
type2_name_set = set(df['Type 2'])
all_type_pair, now_type_pair = set(), set()
for x in type1_name_set:
    for y in type2_name_set:
        all_type_pair.add((x, y))
for x,y in zip(df['Type 1'], df['Type 2']):
    now_type_pair.add((x,y))
name_diff = all_type_pair.difference(now_type_pair)
print(len(name_diff),’\n’, name_diff)

199
{(‘Fighting’, ‘Fighting’), (‘Ghost’, ‘Ground’), (‘Dragon’, ‘Ghost’), (‘Grass’, ‘Dragon’), (‘Flying’, ‘Electric’), (‘Ice’, ‘Electric’), (‘Dark’, ‘Fairy’), (‘Grass’, ‘Grass’), (‘Electric’, ‘Psychic’), (‘Grass’, ‘Water’), (‘Electric’, ‘Poison’), (‘Bug’, ‘Psychic’), (‘Grass’, ‘Bug’), (‘Psychic’, ‘Psychic’), (‘Normal’, ‘Poison’), (‘Psychic’, ‘Normal’), (‘Grass’, ‘Fire’), (‘Steel’, ‘Dark’), (‘Poison’, ‘Psychic’), (‘Bug’, ‘Dragon’), (‘Grass’, ‘Ghost’), (‘Fairy’, ‘Fighting’), (‘Psychic’, ‘Steel’), (‘Poison’, ‘Normal’), (‘Bug’, ‘Ice’), (‘Ground’, ‘Normal’), (‘Ghost’, ‘Fairy’), (‘Dragon’, ‘Normal’), (‘Poison’, ‘Steel’), (‘Electric’, ‘Grass’), (‘Electric’, ‘Water’), (‘Psychic’, ‘Water’), (‘Flying’, ‘Fighting’), (‘Ice’, ‘Fighting’), (‘Steel’, ‘Electric’), (‘Bug’, ‘Bug’), (‘Dragon’, ‘Steel’), (‘Normal’, ‘Fighting’), (‘Poison’, ‘Grass’), (‘Ground’, ‘Grass’), (‘Rock’, ‘Electric’), (‘Ground’, ‘Water’), (‘Fire’, ‘Grass’), (‘Dragon’, ‘Grass’), (‘Poison’, ‘Fire’), (‘Dark’, ‘Electric’), (‘Fighting’, ‘Rock’), (‘Dragon’, ‘Water’), (‘Ground’, ‘Fire’), (‘Poison’, ‘Ghost’), (‘Electric’, ‘Dark’), (‘Psychic’, ‘Dark’), (‘Fire’, ‘Fire’), (‘Grass’, ‘Normal’), (‘Fairy’, ‘Ice’), (‘Fire’, ‘Ghost’), (‘Electric’, ‘Ground’), (‘Psychic’, ‘Ground’), (‘Rock’, ‘Rock’), (‘Fire’, ‘Dark’), (‘Ghost’, ‘Electric’), (‘Dragon’, ‘Dark’), (‘Dark’, ‘Rock’), (‘Flying’, ‘Ice’), (‘Ice’, ‘Ice’), (‘Ground’, ‘Ground’), (‘Normal’, ‘Electric’), (‘Fairy’, ‘Rock’), (‘Psychic’, ‘Poison’), (‘Bug’, ‘Normal’), (‘Flying’, ‘Bug’), (‘Poison’, ‘Poison’), (‘Ground’, ‘Poison’), (‘Fire’, ‘Poison’), (‘Flying’, ‘Rock’), (‘Ice’, ‘Rock’), (‘Fighting’, ‘Dragon’), (‘Dragon’, ‘Poison’), (‘Fighting’, ‘Ice’), (‘Normal’, ‘Rock’), (‘Bug’, ‘Fairy’), (‘Ghost’, ‘Fighting’), (‘Electric’, ‘Fighting’), (‘Steel’, ‘Ice’), (‘Poison’, ‘Fairy’), (‘Fighting’, ‘Grass’), (‘Ground’, ‘Fairy’), (‘Fighting’, ‘Bug’), (‘Fairy’, ‘Psychic’), (‘Fire’, ‘Fairy’), (‘Dragon’, ‘Fairy’), (‘Grass’, ‘Electric’), (‘Fighting’, ‘Fire’), (‘Fighting’, ‘Ghost’), (‘Dragon’, ‘Fighting’), (‘Bug’, ‘Dark’), (‘Fairy’, ‘Dragon’), (‘Fairy’, ‘Steel’), (‘Flying’, ‘Flying’), (‘Flying’, ‘Psychic’), (‘Flying’, ‘Normal’), (‘Ice’, ‘Normal’), (‘Dark’, ‘Bug’), (‘Fairy’, ‘Grass’), (‘Ghost’, ‘Ice’), (‘Ice’, ‘Dragon’), (‘Fairy’, ‘Water’), (‘Fairy’, ‘Bug’), (‘Flying’, ‘Steel’), (‘Electric’, ‘Electric’), (‘Ice’, ‘Steel’), (‘Psychic’, ‘Electric’), (‘Normal’, ‘Dragon’), (‘Normal’, ‘Ice’), (‘Fairy’, ‘Fire’), (‘Fairy’, ‘Ghost’), (‘Poison’, ‘Electric’), (‘Flying’, ‘Grass’), (‘Ice’, ‘Grass’), (‘Ghost’, ‘Bug’), (‘Flying’, ‘Water’), (‘Ice’, ‘Bug’), (‘Fire’, ‘Electric’), (‘Fighting’, ‘Poison’), (‘Fighting’, ‘Normal’), (‘Normal’, ‘Bug’), (‘Ghost’, ‘Rock’), (‘Flying’, ‘Fire’), (‘Ice’, ‘Fire’), (‘Water’, ‘Bug’), (‘Electric’, ‘Rock’), (‘Flying’, ‘Ghost’), (‘Water’, ‘Fire’), (‘Fairy’, ‘Ground’), (‘Rock’, ‘Normal’), (‘Flying’, ‘Dark’), (‘Steel’, ‘Steel’), (‘Fighting’, ‘Fairy’), (‘Dark’, ‘Poison’), (‘Dark’, ‘Normal’), (‘Fighting’, ‘Water’), (‘Dragon’, ‘Rock’), (‘Fairy’, ‘Poison’), (‘Flying’, ‘Ground’), (‘Ground’, ‘Fighting’), (‘Steel’, ‘Grass’), (‘Fairy’, ‘Normal’), (‘Steel’, ‘Water’), (‘Steel’, ‘Bug’), (‘Steel’, ‘Fire’), (‘Ghost’, ‘Psychic’), (‘Dark’, ‘Grass’), (‘Flying’, ‘Poison’), (‘Ice’, ‘Poison’), (‘Dark’, ‘Water’), (‘Ghost’, ‘Normal’), (‘Rock’, ‘Fire’), (‘Fairy’, ‘Fairy’), (‘Normal’, ‘Normal’), (‘Rock’, ‘Ghost’), (‘Water’, ‘Normal’), (‘Ghost’, ‘Steel’), (‘Grass’, ‘Rock’), (‘Electric’, ‘Dragon’), (‘Fighting’, ‘Ground’), (‘Psychic’, ‘Dragon’), (‘Electric’, ‘Ice’), (‘Psychic’, ‘Ice’), (‘Normal’, ‘Steel’), (‘Fighting’, ‘Electric’), (‘Poison’, ‘Ice’), (‘Flying’, ‘Fairy’), (‘Ghost’, ‘Water’), (‘Dark’, ‘Dark’), (‘Ice’, ‘Fairy’), (‘Ground’, ‘Ice’), (‘Fire’, ‘Dragon’), (‘Electric’, ‘Bug’), (‘Fire’, ‘Ice’), (‘Psychic’, ‘Bug’), (‘Dragon’, ‘Dragon’), (‘Fairy’, ‘Dark’), (‘Water’, ‘Water’), (‘Dark’, ‘Ground’), (‘Electric’, ‘Fire’), (‘Ghost’, ‘Ghost’), (‘Steel’, ‘Poison’), (‘Psychic’, ‘Rock’), (‘Normal’, ‘Fire’), (‘Steel’, ‘Normal’), (‘Normal’, ‘Ghost’), (‘Ground’, ‘Bug’), (‘Rock’, ‘Poison’), (‘Fire’, ‘Bug’), (‘Poison’, ‘Rock’), (‘Ice’, ‘Dark’), (‘Dragon’, ‘Bug’), (‘Normal’, ‘Dark’), (‘Fairy’, ‘Electric’)}

3.按照下述要求,构造Series:
a)取出物攻,超过120的替换为high,不足50的替换为low,否则设为mid
b)取出第一属性,分别用replace和apply替换所有字母为大写
c)求每个妖怪六项能力的离差,即所有能力中偏离中位数最大的值,添加到df并从大到小排序

ret_a = df['Attack'].mask(df['Attack']>120, 'high').mask(df['Attack']<50, 'low').mask((50<=df['Attack'])&(df['Attack']<=120), 'mid')

df['Type 1'].replace({i:str.upper(i) for i in df['Type 1'].unique()})
df['Type 1'].apply(lambda x : str(x).upper())

list_cols = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
df['max_devision'] = df[list_cols].apply(lambda x : np.max((x-x.median()).abs()), 1)
df.sort_values('max_devision', ascending=False).head()
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值