项目场景:
利用python进行稳健性回归检验时出现的问题。
问题描述:
在使用新版sklearn进行回归检验时,数据应该为二维矩阵,但问题就在这了,我并不知道应该在哪个位置将数据转换为二维矩阵,进行了很多尝试也没有成功,于是来找各位程序猿大哥们解答一下问题。
import pandas as pd
import numpy as np
from sklearn import model_selection
data = pd.read_excel('Eletrical length data set.xlsx')
Y = data['Electrical length']
X = data.copy().drop(['Electrical length'],axis=1)
train_x,test_x,train_x,train_y = model_selection.train_test_split(X,Y,test_size=0.2)
data.head()
from sklearn.linear_model import LinearRegression
from sklearn import metrics
lr = LinearRegression()
lr.fit(train_x,train_y)
pred_y_test = lr.predict(test_x)
pred_y_train = lr.predict(train_x)
print('训练集的决定系数:',round(metrics.r2_score(train_y,pred_y_train),4))
print('测试集的决定系数:',round(metrics.r2_score(test_y,pred_y_test),4))
输出如下
Inhabitants Distance Electrical length
0 15 605.000000 2146
1 13 696.669983 2148
2 25 443.329987 2178
3 22 373.329987 1322
4 19 340.000000 1075
ValueError Traceback (most recent call last)
<ipython-input-3-88e45c504957> in <module>
2 from sklearn import metrics
3 lr = LinearRegression()
----> 4 lr.fit(train_x,train_y)
5 pred_y_test = lr.predict(test_x)
6 pred_y_train = lr.predict(train_x)
D:\python\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
516 accept_sparse = False if self.positive else ['csr', 'csc', 'coo']
517
--> 518 X, y = self._validate_data(X, y, accept_sparse=accept_sparse,
519 y_numeric=True, multi_output=True)
520
D:\python\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
D:\python\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
D:\python\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
812 raise ValueError("y cannot be None")
813
--> 814 X = check_array(X, accept_sparse=accept_sparse,
815 accept_large_sparse=accept_large_sparse,
816 dtype=dtype, order=order, copy=copy,
D:\python\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
D:\python\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
635 # If input is 1D raise error
636 if array.ndim == 1:
--> 637 raise ValueError(
638 "Expected 2D array, got 1D array instead:\narray={}.\n"
639 "Reshape your data either using array.reshape(-1, 1) if "
ValueError: Expected 2D array, got 1D array instead:
array=[ 374 856 897 2560 1002 2199 2300 1500 1676 733 592 2470 4115 2594
4712 1764 1141 2082 2110 616 1535 1607 160 2765 2629 1000 542 516
660 755 1230 623 2764 1726 1044 5009 3330 1285 466 2090 2426 2515
413 96 2253 1196 1354 622 1190 1946 764 1520 123 1232 3215 1146
1210 456 1590 744 1601 1003 1675 952 921 5384 1064 764 2289 4880
1531 2608 1100 2064 1404 4820 883 656 1723 1181 1948 970 1277 1417
1491 2745 2976 2748 1345 231 1176 1362 1391 1880 2754 2178 432 1987
1143 1855 1290 3362 673 638 864 3235 2146 1704 1396 1064 1460 1698
2651 840 353 657 1047 2136 496 915 2039 132 3605 500 2810 517
2437 1075 570 1818 1176 4673 4845 1092 1336 3898 1349 346 1575 2857
2797 1144 4210 1466 844 1046 827 2800 1753 1599 2610 4268 1671 605
3433 643 1010 271 773 1019 1325 1471 1660 1287 150 2684 2134 1478
176 3729 2686 1409 1296 469 80 468 574 1295 1084 1670 599 2452
2257 3117 657 481 2145 1733 739 1140 1697 3357 1571 1773 2891 1979
1229 2160 1360 1641 668 1538 1284 365 2362 862 621 1588 839 1942
1835 675 1998 2260 3310 3200 1330 1370 2628 903 828 1614 5401 2470
445 1453 844 533 1868 3731 2275 1422 498 3064 670 1478 1206 1025
621 1036 2274 1736 358 375 212 1207 1032 1130 2225 993 2312 1224
3380 1535 1605 2124 890 1956 1364 1241 2435 1898 3096 2000 987 2389
737 1207 723 942 384 2557 353 1500 3327 567 2684 4750 1885 1723
2291 1663 1894 513 3221 1671 642 835 2478 2216 1972 1895 3302 1110
774 528 1167 2702 692 823 1776 3570 3157 1441 1840 2522 2142 1953
1370 826 6465 1421 80 1003 1187 2891 1011 126 1328 1799 742 1805
4046 4437 1276 1982 3985 686 804 1023 118 4016 913 1430 1617 1532
2521 1739 1698 1296 2123 1357 860 2139 1368 1412 625 2305 1192 2974
1623 404 484 2959 1113 1083 4610 1264 2809 5990 1694 814 1528 2737
905 909 1565 1534 910 895 2148 1285 1261 1936 1147 1559 534 1691
3000 1051 482 2082 811 4879 1987 2097 1660 1679 357 1108 1803 1130
1015 1663 2898 2087].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
原因分析:
出现这种问题无非还是我遇到的问题太少,解决问题的能力还不足,初学代码就遇到这种有些迷茫的境地,真的不知道怎么解决了,给孩子指明一条路吧!
解决方案:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
(这是错误数据后面给出的解决方案,但我多次尝试后,没成功,我真不知道这个【.reshape(-1,1)】加在哪,希望有会这个的朋友能给我讲解一下,不胜感激)
文件在上传中,审核通过后我给补过来!!!