我试图根据X值来预测y值。我有一个Excel文件,里面有一个人有多少兄弟姐妹和配偶。该文件还包含一个生存结果y(1=幸存,0=死亡)。在
下面的代码片段展示了我是如何做到这一点的dataset = pd.read_excel("TitanicData.xlsx", sheet_name="TitanicData")
dataSet.head()
dataSet.columns
SibSp = dataSet.iloc[:, 6]
Parch = dataSet.iloc[:, 7]
Stack = np.column_stack((SibSp, Parch, SibSp + Parch))
Family = pd.DataFrame(Stack, columns=['SibSp', 'Parch', 'Family'])
X = Family.iloc[:, 2]
y = dataSet.iloc[:, 1]
这就给了我所期望的正确值,y是一个1和0的数据帧,用来描述人是否死了,X包含{}和Parch列的和。在
然后我将数据分成训练和测试数据帧,这样做(更新以显示X_train,X_test的来源)
^{pr2}$
但是,当我尝试使用sklearn.linear_model.LinearRegression时,我开始出现错误classifier = LinearRegression()
classifier.fit(X_train, y_train)
classifier.predict(X_test)ValueError: Expected 2D array, got 1D array instead: array=[ 1 2 0 1 0 0 0 0 4 ...] Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.classifier.fit(X_train, y_train)
如何将我的训练值放入分类器?在
更新:print(X_train.shape, y_train.values.reshape(-1,1).shape)
给我(534,) (534, 1)
更新以显示完整的调试跟踪File "", line 1, in
train()
File "C:/Users/user/Desktop/dantitanic/AnotherTest.py", line 41, in train
classifier.fit(X_train, y_train)
File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\linear_model\base.py", line 458, in fit
y_numeric=True, multi_output=True)
File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 756, in check_X_y
estimator=estimator)
File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 552, in check_array
"if it contains a single sample.".format(array))