Pandas DataFrame Official Doc
只是单纯的做个笔记。
pandas包主要是用来高效,灵活的处理数据。 看了dataframe有关的描述,上层是bunch,dict, object。就当做字典来对待,其中key是列名,value就是每列的值。其中,调用相关函数,例如min,max之类都是对每个key的value操作。apply也一样。更多的操作,可以看官方操作。这里我试着做了一下Normalization,也是非常的方便。只要多熟悉就可以了。
数据归一化
orders = [["David","3","Ceviche"],["Corina","10","Beef Burrito"],["David","3","Fried Chicken"],["Carla","5","Water"],["Carla","5","Ceviche"],["Rous","3","Ceviche"]]
import pandas as pd
import math
foods = sorted(list(set(map(lambda x: x[2], orders))))
tables = sorted(list(set(map(lambda x: x[1], orders))), key = lambda x: int(x))
n, m = len(foods), len(tables)
for u, t, f in orders:
df.iloc[list(df.index).index(t),list(df.columns).index(f)]+=1
df
Beef Burrito | Ceviche | Fried Chicken | Water | |
---|---|---|---|---|
3 | 0.0 | 2.0 | 1.0 | 0.0 |
5 | 0.0 | 2.0 | 1.0 | 0.0 |
10 | 0.0 | 2.0 | 1.0 | 0.0 |
df_Zscore = df.apply(lambda x: (x - x.mean())/math.sqrt(sum((x - x.min())**2 / len(x))))
print("\t\tZ-score Normalization\n", df_Zscore)
df_minmax = df.apply(lambda x: (x - x.min())/(x.max()-x.min()))
print("\t\tMinMax normalization\n ", df_minmax)
Z-Score Normalization
Beef Burrito | Ceviche | Fried Chicken | Water | |
---|---|---|---|---|
3 | -0.577350 | 0.774597 | 1.154701 | -0.577350 |
5 | -0.577350 | 0.000000 | -0.577350 | 1.154701 |
10 | 1.154701 | -0.774597 | -0.577350 | -0.577350 |
MinMax Normalization
Beef Burrito | Ceviche | Fried Chicken | Water | |
---|---|---|---|---|
3 | 0.0 | 1.0 | 1.0 | 0.0 |
5 | 0.0 | 0.5 | 0.0 | 1.0 |
10 | 1.0 | 0.0 | 0.0 | 0.0 |