python数据拆分_Python中的逻辑回归-拆分数据-CSDN博客

python数据拆分

Python中的逻辑回归-拆分数据 (Logistic Regression in Python - Splitting Data)

We have about forty-one thousand and odd records. If we use the entire data for model building, we will not be left with any data for testing. So generally, we split the entire data set into two parts, say 70/30 percentage. We use 70% of the data for model building and the rest for testing the accuracy in prediction of our created model. You may use a different splitting ratio as per your requirement.

我们有大约四万一千条记录。如果我们将全部数据用于模型构建，则不会剩下任何数据用于测试。因此，通常，我们将整个数据集分为两个部分，例如70/30百分比。我们将70％的数据用于模型构建，其余的用于测试预测所创建模型的准确性。您可以根据需要使用不同的拆分比率。

创建特征数组 (Creating Features Array)

Before we split the data, we separate out the data into two arrays X and Y. The X array contains all the features (data columns) that we want to analyze and Y array is a single dimensional array of boolean values that is the output of the prediction. To understand this, let us run some code.

在拆分数据之前，我们将数据分为两个数组X和Y。X数组包含我们要分析的所有要素(数据列)，Y数组是布尔值的一维数组，是布尔值的输出预测。为了理解这一点，让我们运行一些代码。

Firstly, execute the following Python statement to create the X array −

首先，执行以下Python语句以创建X数组-


In [17]: X = data.iloc[:,1:]

To examine the contents of X use head to print a few initial records. The following screen shows the contents of the X array.

要检查X的内容，请使用head打印一些初始记录。以下屏幕显示了X数组的内容。


In [18]: X.head ()

The array has several rows and 23 columns.

该数组有几行23列。

Next, we will create output array containing “y” values.

接下来，我们将创建包含“ y ”值的输出数组。

创建输出数组 (Creating Output Array)

To create an array for the predicted value column, use the following Python statement −

要为预测值列创建数组，请使用以下Python语句-


In [19]: Y = data.iloc[:,0]

Examine its contents by calling head. The screen output below shows the result −

检查头内容。以下屏幕输出显示结果-


In [20]: Y.head()
Out[20]: 0   0
1    0
2    1
3    0
4    1
Name: y, dtype: int64

Now, split the data using the following command −

现在，使用以下命令拆分数据-


In [21]: X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)

This will create the four arrays called X_train, Y_train, X_test, and Y_test. As before, you may examine the contents of these arrays by using the head command. We will use X_train and Y_train arrays for training our model and X_test and Y_test arrays for testing and validating.

这将创建四个数组，分别称为X_train，Y_train，X_test和Y_test 。和以前一样，您可以使用head命令检查这些数组的内容。我们将使用X_train和Y_train数组来训练我们的模型，并使用X_test和Y_test数组来进行测试和验证。

Now, we are ready to build our classifier. We will look into it in the next chapter.

现在，我们准备构建分类器。我们将在下一章对此进行研究。