import pandas as pd
df = pd.read_csv("http://www.mosaic-web.org/go/datasets/galton.csv")print(df.head())
family father mother sex height nkids
0 1 78.5 67.0 M 73.2 4
1 1 78.5 67.0 F 69.2 4
2 1 78.5 67.0 F 69.0 4
3 1 78.5 67.0 F 69.0 4
4 2 75.5 66.5 M 73.5 4
outliers_iqr(df['height'])
(array([288]),)
print("Outliers using outliers_iqr()")print("=============================")for i in outliers_iqr(df.height)[0]:print(df[i:i+1])
Outliers using outliers_iqr()
=============================
family father mother sex height nkids
288 72 70.0 65.0 M 79.0 7
Z-Score
defoutliers_z_score(data):
threshold =3
mean = np.mean(data)
std = np.std(data)
z_scores =[(y - mean)/ std for y in data]return np.where(np.abs(z_scores)> threshold)
defoutliers_z_score(data):
threshold =3
mean = np.mean(data)
std = np.std(data)
z_score =[(y-mean)/ std for y in data]return np.where(np.abs(z_score)> threshold)
print("Outliers using outliers_z_score()")print("=================================")for i in outliers_z_score(df.height)[0]:print(df[i:i+1])print()
Outliers using outliers_z_score()
=================================
family father mother sex height nkids
125 35 71.0 69.0 M 78.0 5
family father mother sex height nkids
288 72 70.0 65.0 M 79.0 7
family father mother sex height nkids
672 155 68.0 60.0 F 56.0 7
Data CleansingCleaning Rows with NaNsimport pandas as pddf = pd.read_csv('NaNDataset.csv')df.isnull().sum()A 0B 2C 0dtype: int64print(df) A B C0 1 2.0 31 4 ...