推荐系统--Surprise模型选择模块selection moudle

Surprise库中  The model_selection package  提供了算法的交叉验证和参数选择功能

1:交叉验证迭代器 (类似于scikit-learn)

KFold基础k折交叉验证

RepeatedKFold 多次k折交叉验证.

ShuffleSplit乱序训练集和数据集下的基础交叉验证

LeaveOneOut在测试集上每个用户只取一个评分做交叉验证

PredefinedKFold:数据集是通过方法 load_from_folds 加载进来的交叉验证方法.

 当然,该模块提供了train_test_split方法切分数据集

  • surprise.model_selection.split.KFold(n_splits=5, random_state=None, shuffle=True)

该类下面包括 方法:split(dataset) return:tuple of (trainset, testset)

每次验证拿出fold中的一折做测试数据,其他k-1折用于训练:

参数:n_splits (int) – The number of folds.

          random_state (取值如下) – 决定是否使用RNG来划分数据,

                  1:int, random_state 用于新的RNG的seed. 用于保证多次调用split()方法可以得到相同的数据集划分

                  2:RandomState instance, this same instance is used as RNG. (Random Number Generator)

                  3:None, the current RNG from numpy is used. 

                  注意:random_state 只有是shuffle = True时才被使用. 默认是None.

      shuffle (bool) – 在切分数据时是否洗牌. 洗牌并不是原地完成的. 默认True.
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import KFold

# Load the movielens-100k dataset
data = Dataset.load_builtin('ml-100k')

# define a cross-validation iterator
kf = KFold(n_splits=3)

algo = SVD()

for trainset, testset in kf.split(data):

    # train and test algorithm.
    algo.fit(trainset)
    predictions = algo.test(testset)

    # Compute and print Root Mean Squared Error
    accuracy.rmse(predictions, verbose=True)

输出:

RMSE: 0.9374
RMSE: 0.9476
RMSE: 0.9478

  • surprise.model_selection.split.LeaveOneOut(n_splits=5, random_state=None)

    测试集上每个用户只取一个评分做交叉验证,与其他交叉验证策略相反,随机分割并不能保证所有的折叠都不相同,尽管这对于相当大的数据集仍然很有可能。参数类似于上面KFold

  • surprise.model_selection.split.PredefinedKFold
from surprise import SVD
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import PredefinedKFold

# path to dataset folder
files_dir = os.path.expanduser('~/.surprise_data/ml-100k/ml-100k/')

# This time, we'll use the built-in reader.
reader = Reader('ml-100k')

# folds_files is a list of tuples containing file paths:
# [(u1.base, u1.test), (u2.base, u2.test), ... (u5.base, u5.test)]
train_file = files_dir + 'u%d.base'
test_file = files_dir + 'u%d.test'
folds_files = [(train_file % i, test_file % i) for i in (1, 2, 3, 4, 5)]

data = Dataset.load_from_folds(folds_files, reader=reader)
pkf = PredefinedKFold()

algo = SVD()

for trainset, testset in pkf.split(data):

    # train and test algorithm.
    algo.fit(trainset)
    predictions = algo.test(testset)

    # Compute and print Root Mean Squared Error
    accuracy.rmse(predictions, verbose=True)

  • surprise.model_selection.split.RepeatedKFold(n_splits=5, n_repeats=10, random_state=None)

        多次交叉验证,每次分割都是随机的

  • surprise.model_selection.split.ShuffleSplit(n_splits=5,test_size=0.2,train_size=None,random_state=None, shuffle=True)

           使用随机切分的数据集

  • surprise.model_selection.split.train_test_split(data, test_size=0.2, train_size=None, random_state=None, shuffle=True)

2:交叉验证

  • surprise.model_selection.validation.cross_validate(algo, data, measures=[u'rmse', u'mae'], cv=None, return_train_measures=False, n_jobs=-1, pre_dispatch=u'2*n_jobs', verbose=False)
from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate


# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')

# We'll use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
输出结果:
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

            Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std
RMSE        0.9311  0.9370  0.9320  0.9317  0.9391  0.9342  0.0032
MAE         0.7350  0.7375  0.7341  0.7342  0.7375  0.7357  0.0015
Fit time    6.53    7.11    7.23    7.15    3.99    6.40    1.23
Test time   0.26    0.26    0.25    0.15    0.13    0.21    0.06

参数:

  • algo (AlgoBase) – 待评估算法
  • data (Dataset) – 评估数据集.
  • measures (list of string) – 评估计算方法.  accuracy 里面定义的方法名. Default is 
  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值