在机器学习或其他项目中,通常的做法是保存一定百分比的数据用于测试,其余(通常较大的数据块)用于训练。此函数可以实现相同的功能。用户可以提供为训练数据保留的百分比。
During machine learning or other projects, it is a usual practice to save some percentage of data for testing and the rest (usually larger chunk) is used for training. This function accomplishes the same. The user has the freedom to provide the percentage to be reserved for training.
Inputs ->
i. the complete dataset (grouped by one type of label only)(IMP: get rid of the col titles’ row)
ii. percentage of the dataset to be dedicate towards ** testing** dataset (usually smaller portion of the whole dataset)
Outputs ->
i. first is the larger chunk of the dataset (training dataset)
ii. smaller chunk of the dataset (testing dataset)
更多精彩文章请关注公众号: