python机器学习基础笔记4之数值数据处理(cook book)

数值类数据处理

MinMaxScaler

方便机器算法的performance
MinMaxScaler

# Load libraries
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Create feature
feature = np.array([[-500.5],
					[-100.1],
					[0],
					[100.1],
					[900.9]])

# Create scaler
minmax_scale = preprocessing.MinMaxScaler(feature_range=(0, 1))

# Scale feature
scaled_feature = minmax_scale.fit_transform(feature)

# Show feature
scaled_feature

在这里插入图片描述

Standardizing

StandardScaler
均值0,方差为1

import numpy as np
from sklearn import preprocessing
# Create feature
x = np.array([[-1000.1],
	   		  [-200.2],
			  [500.5],
 			  [600.6],
   			  [9000.9]])
   			  
#Create scaler
scaler = preprocessing.StandardScaler()

# Transform the feature
standardized = scaler.fit_transform(x)

# Show feature
standardized

如果有outliers 那种异常值,那就偶用RobustScaler

Normalizing

Normalizer
注意 norm = L2, 表示的是l2范式,说白了就是,二次方和的root
unit norm of length 1

# Load libraries
import numpy as np
from sklearn.preprocessing import Normalizer
# Create feature matrix
features = np.array([[0.5, 0.5],
					 [1.1, 3.4],
					 [1.5, 20.2],
					 [1.63, 34.4],
					 [10.9, 3.3]])
					 
# Create normalizer
normalizer = Normalizer(norm="l2")

# Transform feature matrix
normalizer.transform(features)

Generating Polynomial and Interaction Features

# Load libraries
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Create feature matrix
features = np.array([[2, 3],
					[2, 3],
					[2, 3]])
					
# Create PolynomialFeatures object
polynomial_interaction = PolynomialFeatures(degree=2, include_bias=False)

# Create polynomial features
polynomial_interaction.fit_transform(features)
array([[ 2., 3., 4., 6., 9.],
[ 2., 3., 4., 6., 9.],
[ 2., 3., 4., 6., 9.]])


#We can restrict the features created to only #interaction features by setting interac
#tion_only to True:
#interaction = PolynomialFeatures(degree=2,
#interaction_only=True, include_bias=False)
#interaction.fit_transform(features)

transform features

你想对features操作自己的函数

import numpy as np
from sklearn.preprocessing import 
FunctionTransformer
# Create feature matrix
features = np.array([[2, 3],
					[2, 3],
					[2, 3]])
# Define a simple function
def add_ten(x):
	return x + 10
# Create transformer
ten_transformer = FunctionTransformer(add_ten)
# Transform feature matrix
ten_transformer.transform(features)

或者用pandas!!!!
重点应用这个

 import pandas as pd
 
# Create DataFrame
df = pd.DataFrame(features, columns=["feature_1", "feature_2"])

# Apply function
df.apply(add_ten)

检测异常值(Detecting Outliers)

一个常用的方式使用 椭圆, 椭圆里面的就是Inlier并且标记为1
再椭圆外面的都是outlier并且标记为 1

另外一种就是用,IQR(interquartile range)

还有一种用之前说过的RobustScaler

离散数据( discretize features)

You have a numerical feature and want to break it up into discrete bins.

Binarizer() digitize()

import numpy as np
from sklearn.preprocessing import Binarizer

# Create binarizer
binarizer = Binarizer(18)

# Transform feature
binarizer.fit_transform(age)


Second, we can break up numerical features according to multiple thresholds:


np.digitize(age, bins=[20,30,64])

删除缺失值

# Load library
import pandas as pd

# Load data
dataframe = pd.DataFrame(features, columns=["feature_1", "feature_2"])

# Remove observations with missing values
dataframe.dropna()

填充缺失值(imputing missing data)

导入 Imputer

# Load library
from sklearn.preprocessing import Imputer

# Create imputer
mean_imputer = Imputer(strategy="mean", axis=0)

# Impute values
 Datafeatures_mean_imputed = mean_imputer.fit_transform(features)

总之,此章节只要讲 数值型数据处理

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

万物琴弦光锥之外

给个0.1,恭喜老板发财

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值