python3.7 机器学习-Day3

代码来源

(https://github.com/MLEveryday/100-Days-Of-ML-Code.git)

说明:文章中的python代码大部分来自于github(少数是学习时测试添加),所附笔记为学习时注。

Day3 多元线性回归

基本步骤:

数据预处理–>在训练集上训练模型–>预测结果

学习笔记(含测试部分)

# Day3:Multiple_Linear_Regression
# 2019.2.15
# coding=utf-8

import warnings
warnings.filterwarnings("ignore")

# Importing the libraries
# 导入库
import pandas as pd
import numpy as np

# Importing the dataset
# 导入数据
dataset = pd.read_csv('C:/Users/Ymy/Desktop/100-Days-Of-ML-Code/datasets/50_Startups.csv')
# X为dataset的第0~3列(0:-1 表示从0开始到总列数-1结束),Y为第4列
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : ,  4 ].values

print("导入数据")
print("X")
print(X)
print("Y")
print(Y)
print("----------------")

# 检查缺失数据,由于数据中无缺失值,此步骤略去

# Encoding Categorical data
# 数据分类 (详细解释见 Day1)

# 1.从sklearn.preprocessing中引入LabelEncoder, OneHotEncoder两个类
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# 2.创建LabelEncoder类的对象labelencoder,并使用fit_transform方法
labelencoder = LabelEncoder()
# 对X中的第3类使用fit_transform
X[: , 3] = labelencoder.fit_transform(X[ : , 3])

print("labelEncoder")
print("X")
print(X)

# 3.创建OneHotEncoder类的对象onehotencoder,并使用fit_transform方法
# 此独热编码将根据样本的第3列对目标数据进行编码
onehotencoder = OneHotEncoder(categorical_features = [3])
# 将X第3列进行独热编码,将结果转为成数组,并赋值给X
X = onehotencoder.fit_transform(X).toarray()

print("OneHotEncoder")
print("X")
print(X)
print("----------------")

# Avoiding Dummy Variable Trap(具体解释见底部 说明1)
# 避免虚拟变量陷阱
# 删除了X的第0列,即使用两个虚拟变量表示标签编码中的0、1、2这三个类别
X = X[: , 1:]

print("避免虚拟变量陷阱")
print("X")
print(X)
print("----------------")

# Splitting the dataset into the Training set and Test set
# 划分数据集(测试集占20%)
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

print("X_train")
print(X_train)
print("X_test")
print(X_test)
print("Y_train")
print(Y_train)
print("Y_test")
print(Y_test)
print("----------------")

# Fitting Multiple Linear Regression to the Training set
# 对训练集进行多元线性回归,方法同 Day2的简单线性回归

# 1.使用sklearn.linear_model的LinearRegression类
from sklearn.linear_model import LinearRegression

# 2.创建LinearRegression类的对象regressor,并使用fit()方法
regressor = LinearRegression()
regressor.fit(X_train, Y_train)


# Predicting the Test set results
# 使用上一步的方法,预测测试集的结果
y_pred = regressor.predict(X_test)

# regression evaluation
# 回归评估

# 使用sklearn.metrics的r2_score
from sklearn.metrics import r2_score

'''
R ^ 2(确定系数)回归分数函数
def r2_score(y_true, y_pred, sample_weight=None,multioutput="uniform_average"):

    """R^2 (coefficient of determination) regression score function.
    Best possible score is 1.0 and it can be negative (because the
    model can be arbitrarily worse). A constant model that always
    predicts the expected value of y, disregarding the input features,
    would get a R^2 score of 0.0."""
    最佳分数为1.0,它可能是负数(因为模型可以任意更差)
    总是预测y的期望值的常数模型,忽略输入特征,将得到R ^ 2得分为0.0
    
    Parameters
    ----------
    y_true (事实目标值): array-like of shape = (n_samples) or (n_samples, n_outputs)
        Ground truth (correct) target values.
    y_pred (估计目标值): array-like of shape = (n_samples) or (n_samples, n_outputs)
        Estimated target values.
    sample_weight : array-like of shape = (n_samples), optional
        Sample weights.
    multioutput : string in ['raw_values', 'uniform_average', \
'variance_weighted'] or None or array-like of shape (n_outputs)
        Defines aggregating of multiple output scores.
        Array-like value defines weights used to average scores.
        Default is "uniform_average".默认是"uniform_average"
        'raw_values' :
            Returns a full set of scores in case of multioutput input.
            在多输出输入的情况下返回一整套分数
        'uniform_average' :
            Scores of all outputs are averaged with uniform weight.
            所有输出的分数均匀均匀
        'variance_weighted' :
            Scores of all outputs are averaged, weighted by the variances
            of each individual output.
            对所有输出的分数进行平均,并根据每个输出的方差进行加权
    Returns
    -------
    z : float or ndarray of floats
'''

# 计算回归的R ^ 2得分,并输出
print("R ^ 2得分为:")
print(r2_score(Y_test,y_pred))

输出

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> 
 RESTART: C:\Users\Ymy\Desktop\100-Days-Of-ML-Code\study_code\Day3\Day 3.py 
导入数据
X
[[165349.2 136897.8 471784.1 'New York']
 [162597.7 151377.59 443898.53 'California']
 [153441.51 101145.55 407934.54 'Florida']
 [144372.41 118671.85 383199.62 'New York']
 [142107.34 91391.77 366168.42 'Florida']
 [131876.9 99814.71 362861.36 'New York']
 [134615.46 147198.87 127716.82 'California']
 [130298.13 145530.06 323876.68 'Florida']
 [120542.52 148718.95 311613.29 'New York']
 [123334.88 108679.17 304981.62 'California']
 [101913.08 110594.11 229160.95 'Florida']
 [100671.96 91790.61 249744.55 'California']
 [93863.75 127320.38 249839.44 'Florida']
 [91992.39 135495.07 252664.93 'California']
 [119943.24 156547.42 256512.92 'Florida']
 [114523.61 122616.84 261776.23 'New York']
 [78013.11 121597.55 264346.06 'California']
 [94657.16 145077.58 282574.31 'New York']
 [91749.16 114175.79 294919.57 'Florida']
 [86419.7 153514.11 0.0 'New York']
 [76253.86 113867.3 298664.47 'California']
 [78389.47 153773.43 299737.29 'New York']
 [73994.56 122782.75 303319.26 'Florida']
 [67532.53 105751.03 304768.73 'Florida']
 [77044.01 99281.34 140574.81 'New York']
 [64664.71 139553.16 137962.62 'California']
 [75328.87 144135.98 134050.07 'Florida']
 [72107.6 127864.55 353183.81 'New York']
 [66051.52 182645.56 118148.2 'Florida']
 [65605.48 153032.06 107138.38 'New York']
 [61994.48 115641.28 91131.24 'Florida']
 [61136.38 152701.92 88218.23 'New York']
 [63408.86 129219.61 46085.25 'California']
 [55493.95 103057.49 214634.81 'Florida']
 [46426.07 157693.92 210797.67 'California']
 [46014.02 85047.44 205517.64 'New York']
 [28663.76 127056.21 201126.82 'Florida']
 [44069.95 51283.14 197029.42 'California']
 [20229.59 65947.93 185265.1 'New York']
 [38558.51 82982.09 174999.3 'California']
 [28754.33 118546.05 172795.67 'California']
 [27892.92 84710.77 164470.71 'Florida']
 [23640.93 96189.63 148001.11 'California']
 [15505.73 127382.3 35534.17 'New York']
 [22177.74 154806.14 28334.72 'California']
 [1000.23 124153.04 1903.93 'New York']
 [1315.46 115816.21 297114.46 'Florida']
 [0.0 135426.92 0.0 'California']
 [542.05 51743.15 0.0 'New York']
 [0.0 116983.8 45173.06 'California']]
Y
[192261.83 191792.06 191050.39 182901.99 166187.94 156991.12 156122.51
 155752.6  152211.77 149759.96 146121.95 144259.4  141585.52 134307.35
 132602.65 129917.04 126992.93 125370.37 124266.9  122776.86 118474.03
 111313.02 110352.25 108733.99 108552.04 107404.34 105733.54 105008.31
 103282.38 101004.64  99937.59  97483.56  97427.84  96778.92  96712.8
  96479.51  90708.19  89949.14  81229.06  81005.76  78239.91  77798.83
  71498.49  69758.98  65200.33  64926.08  49490.75  42559.73  35673.41
  14681.4 ]
----------------
labelEncoder
X
[[165349.2 136897.8 471784.1 2]
 [162597.7 151377.59 443898.53 0]
 [153441.51 101145.55 407934.54 1]
 [144372.41 118671.85 383199.62 2]
 [142107.34 91391.77 366168.42 1]
 [131876.9 99814.71 362861.36 2]
 [134615.46 147198.87 127716.82 0]
 [130298.13 145530.06 323876.68 1]
 [120542.52 148718.95 311613.29 2]
 [123334.88 108679.17 304981.62 0]
 [101913.08 110594.11 229160.95 1]
 [100671.96 91790.61 249744.55 0]
 [93863.75 127320.38 249839.44 1]
 [91992.39 135495.07 252664.93 0]
 [119943.24 156547.42 256512.92 1]
 [114523.61 122616.84 261776.23 2]
 [78013.11 121597.55 264346.06 0]
 [94657.16 145077.58 282574.31 2]
 [91749.16 114175.79 294919.57 1]
 [86419.7 153514.11 0.0 2]
 [76253.86 113867.3 298664.47 0]
 [78389.47 153773.43 299737.29 2]
 [73994.56 122782.75 303319.26 1]
 [67532.53 105751.03 304768.73 1]
 [77044.01 99281.34 140574.81 2]
 [64664.71 139553.16 137962.62 0]
 [75328.87 144135.98 134050.07 1]
 [72107.6 127864.55 353183.81 2]
 [66051.52 182645.56 118148.2 1]
 [65605.48 153032.06 107138.38 2]
 [61994.48 115641.28 91131.24 1]
 [61136.38 152701.92 88218.23 2]
 [63408.86 129219.61 46085.25 0]
 [55493.95 103057.49 214634.81 1]
 [46426.07 157693.92 210797.67 0]
 [46014.02 85047.44 205517.64 2]
 [28663.76 127056.21 201126.82 1]
 [44069.95 51283.14 197029.42 0]
 [20229.59 65947.93 185265.1 2]
 [38558.51 82982.09 174999.3 0]
 [28754.33 118546.05 172795.67 0]
 [27892.92 84710.77 164470.71 1]
 [23640.93 96189.63 148001.11 0]
 [15505.73 127382.3 35534.17 2]
 [22177.74 154806.14 28334.72 0]
 [1000.23 124153.04 1903.93 2]
 [1315.46 115816.21 297114.46 1]
 [0.0 135426.92 0.0 0]
 [542.05 51743.15 0.0 2]
 [0.0 116983.8 45173.06 0]]

Warning (from warnings module):
  File "D:\python\lib\site-packages\sklearn\preprocessing\_encoders.py", line 390
    "use the ColumnTransformer instead.", DeprecationWarning)
DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
OneHotEncoder
X
[[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05
  4.7178410e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05
  4.4389853e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05
  4.0793454e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05
  3.8319962e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04
  3.6616842e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04
  3.6286136e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05
  1.2771682e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05
  3.2387668e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05
  3.1161329e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05
  3.0498162e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05
  2.2916095e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04
  2.4974455e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05
  2.4983944e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05
  2.5266493e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05
  2.5651292e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05
  2.6177623e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05
  2.6434606e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05
  2.8257431e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05
  2.9491957e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05
  0.0000000e+00]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05
  2.9866447e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05
  2.9973729e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05
  3.0331926e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05
  3.0476873e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04
  1.4057481e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05
  1.3796262e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05
  1.3405007e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05
  3.5318381e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05
  1.1814820e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05
  1.0713838e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05
  9.1131240e+04]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05
  8.8218230e+04]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05
  4.6085250e+04]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05
  2.1463481e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05
  2.1079767e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04
  2.0551764e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05
  2.0112682e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04
  1.9702942e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04
  1.8526510e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04
  1.7499930e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05
  1.7279567e+05]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04
  1.6447071e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04
  1.4800111e+05]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05
  3.5534170e+04]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05
  2.8334720e+04]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05
  1.9039300e+03]
 [0.0000000e+00 1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05
  2.9711446e+05]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05
  0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04
  0.0000000e+00]
 [1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05
  4.5173060e+04]]
----------------
避免虚拟变量陷阱
X
[[0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05 4.7178410e+05]
 [0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05 4.4389853e+05]
 [1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05 4.0793454e+05]
 [0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05 3.8319962e+05]
 [1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04 3.6616842e+05]
 [0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04 3.6286136e+05]
 [0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05 1.2771682e+05]
 [1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05 3.2387668e+05]
 [0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05 3.1161329e+05]
 [0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05 3.0498162e+05]
 [1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05 2.2916095e+05]
 [0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04 2.4974455e+05]
 [1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05 2.4983944e+05]
 [0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05 2.5266493e+05]
 [1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05 2.5651292e+05]
 [0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05 2.6177623e+05]
 [0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05 2.6434606e+05]
 [0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05 2.8257431e+05]
 [1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05 2.9491957e+05]
 [0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05 2.9866447e+05]
 [0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05 2.9973729e+05]
 [1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05 3.0331926e+05]
 [1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05 3.0476873e+05]
 [0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04 1.4057481e+05]
 [0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05 1.3796262e+05]
 [1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05 1.3405007e+05]
 [0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05 3.5318381e+05]
 [1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05 1.1814820e+05]
 [0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05 1.0713838e+05]
 [1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05 9.1131240e+04]
 [0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05 8.8218230e+04]
 [0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05 4.6085250e+04]
 [1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05 2.1463481e+05]
 [0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05 2.1079767e+05]
 [0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04 2.0551764e+05]
 [1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05 2.0112682e+05]
 [0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04 1.9702942e+05]
 [0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04 1.8526510e+05]
 [0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04 1.7499930e+05]
 [0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05 1.7279567e+05]
 [1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04 1.6447071e+05]
 [0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04 1.4800111e+05]
 [0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05 3.5534170e+04]
 [0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05 2.8334720e+04]
 [0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05 1.9039300e+03]
 [1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05 2.9711446e+05]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05 4.5173060e+04]]
----------------
X_train
[[1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05 2.1463481e+05]
 [0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04 2.0551764e+05]
 [1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05 1.3405007e+05]
 [0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05 2.1079767e+05]
 [1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05 2.9491957e+05]
 [1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05 3.2387668e+05]
 [1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05 2.5651292e+05]
 [0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05 1.9039300e+03]
 [0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04 0.0000000e+00]
 [0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05 1.0713838e+05]
 [0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05 2.6177623e+05]
 [1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05 9.1131240e+04]
 [0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05 4.6085250e+04]
 [0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05 2.6434606e+05]
 [0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04 1.4800111e+05]
 [0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05 2.9866447e+05]
 [0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05 3.5534170e+04]
 [0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05 3.1161329e+05]
 [0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05 2.5266493e+05]
 [0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05 1.3796262e+05]
 [0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04 3.6286136e+05]
 [0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05 2.8257431e+05]
 [0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05 1.7279567e+05]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05 4.5173060e+04]
 [0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05 4.4389853e+05]
 [1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05 2.4983944e+05]
 [0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04 1.9702942e+05]
 [0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04 1.4057481e+05]
 [0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05 1.2771682e+05]
 [1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05 3.0476873e+05]
 [1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05 2.0112682e+05]
 [0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05 2.9973729e+05]
 [0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05 3.0498162e+05]
 [0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04 1.7499930e+05]
 [1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05 2.9711446e+05]
 [0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05 3.8319962e+05]
 [0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05 4.7178410e+05]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05 2.8334720e+04]]
X_test
[[1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05 1.1814820e+05]
 [0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04 2.4974455e+05]
 [1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05 2.2916095e+05]
 [1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04 1.6447071e+05]
 [1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05 4.0793454e+05]
 [0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05 3.5318381e+05]
 [0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04 1.8526510e+05]
 [0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05 8.8218230e+04]
 [1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05 3.0331926e+05]
 [1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04 3.6616842e+05]]
Y_train
[ 96778.92  96479.51 105733.54  96712.8  124266.9  155752.6  132602.65
  64926.08  35673.41 101004.64 129917.04  99937.59  97427.84 126992.93
  71498.49 118474.03  69758.98 152211.77 134307.35 107404.34 156991.12
 125370.37  78239.91  14681.4  191792.06 141585.52  89949.14 108552.04
 156122.51 108733.99  90708.19 111313.02 122776.86 149759.96  81005.76
  49490.75 182901.99 192261.83  42559.73  65200.33]
Y_test
[103282.38 144259.4  146121.95  77798.83 191050.39 105008.31  81229.06
  97483.56 110352.25 166187.94]
----------------
R ^ 2得分为:
0.9347068473282989

说明及参考

说明1:

虚拟变量(Dummy Variable)和虚拟变量陷阱(Dummy Variable Regression)

虚拟变量又称虚设变量、名义变量或哑变量,用以反映质的属性的一个人工变量,是量化了的质变量,通常取值为0或1。

引入哑变量可使线形回归模型变得更复杂,但对问题描述更简明,一个方程能达到两个方程的作用,而且接近现实。

例如,反映文程度的虚拟变量可取为:1:本科学历;0:非本科学历
 
一般地,在虚拟变量的设置中:基础类型、肯定类型取值为1;比较类型,否定类型取值为0。
 
虚拟变量陷阱是指一般在引入虚拟变量时要求如果有m个定性变量,在模型中引入m-1个虚拟变量。否则,如果引入m个虚拟变量,就会导致模型解释变量间出现完全共线性的情况。

我们一般称由于引入虚拟变量个数与定性因素个数相同出现的模型无法估计的问题,称为"虚拟变量陷阱"。

由上述定义:在上述测试中对X的第3列数据项-State先后进行了标签编码和独热编码,其中,标签编码将三个变量New York、California、Florida分别编码为2、0、1。紧接着再对X的这列数据进行独热编码,将2、0、1编码为001、100、010。而此时的3个变量New York、California、Florida,就引入了3个虚拟变量,属于虚拟变量陷阱。因此在Avoiding Dummy Variable Trap中,删除了独热编码的第一列,即使用01、00、10分别表示New York、California、Florida,避免了虚拟变量陷阱。

说明2:由于是刚开始进行学习,因此测试和对应的输出比较详细(已经把行数较多的数据展开)。

资料参考:
sklearn-metrics-r2-score
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn-metrics-r2-score

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值