Randomly Select Elements from an Array in Python
In machine learning, we wanna have a larger dataset to train our model. However, we can hardly find an adequately large dataset in some cases. In this case, we can use K-fold cross validation to select a better model based on a small dataset.
There is another lazy way we can train our model. We randomly select several data from the set as the testing set, we use the rest part of the set as the training set to train our model.
For a given 2D input dataset and a 1D output dataset, say x_data and y_data respectively. For example we choose 5 points as the testing set, the Python code is:
test_index = np.random.choice(len(x_data), 5, replace = False)
x_test = x_data[test_index]
y_test = y_data[test_index]
train_index = np.arange(len(x_data))
train_index = np.delete(train_index, test_index)
x_train = x_data[train_index]
y_train = y_data[train_index]
In this case, we can use the training set and the testing set to train our model.