机器学习-初级进阶(k次交叉验证;网状搜索)

一、k次交叉验证

  1. 原理

    在这里插入图片描述
    将数据集划分为若干等分,然后对每一等分数据当测试集数据进行验证,最后对每一个测试结果进行汇总取平均值

  2. 代码实现

    数据:

     User ID  Gender   Age  EstimatedSalary  Purchased
    15624510    Male  19.0          19000.0          0
    15810944    Male  35.0          20000.0          0
    15668575  Female  26.0          43000.0          0
    15603246  Female  27.0          57000.0          0
    ...
    此数据为针对不同的用户信息,是否会点击投放的广告
    

    代码:

    from sklearn.model_selection import train_test_split
    from sklearn.model_selection import cross_val_score
    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import GridSearchCV
    from sklearn.metrics import confusion_matrix
    from sklearn.svm import SVC
    import pandas as pd
    
    
    # Importing the dataset
    dataset = pd.read_csv('Social_Network_Ads.csv')
    X = dataset.iloc[:, [2, 3]].values
    y = dataset.iloc[:, 4].values
    
    # Splitting the dataset into the Training set and Test set
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 0)
    
    # Feature Scaling
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    
    # Fitting Kernel SVM to the Training set
    classifier = SVC(kernel='rbf', random_state=0)
    classifier.fit(X_train, y_train)
    
    # Predicting the Test set results
    y_pred = classifier.predict(X_test)
    
    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # 应用K次交叉验证
    accuracies = cross_val_score(estimator=classifier, X=X_train, y=y_train, cv=10)  # cv: 将X划为几个等分进行测试
    accuracies.mean()
    accuracies.std()
    

    结果:
    accuracies.mean(): 0.9005302187615868
    accuracies.std():0.06388957356626285

二、网状搜索

  1. 原理

    对于一次数据拟合模型中类的参数如果比较多,那么对于自己手动变更这些参数寻找最佳参数相对比较繁琐,所以利用机器学习类中的网状搜索算法实现最佳参数选择,效率会有很大的提升
    
  2. 代码实现

    数据:
    数据于上面的”k次交叉验证“的数据是一样的
    代码:

    from sklearn.model_selection import train_test_split
    from sklearn.model_selection import cross_val_score
    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import GridSearchCV
    from sklearn.metrics import confusion_matrix
    from sklearn.svm import SVC
    import pandas as pd
    
    
    # Importing the dataset
    dataset = pd.read_csv('Social_Network_Ads.csv')
    X = dataset.iloc[:, [2, 3]].values
    y = dataset.iloc[:, 4].values
    
    # Splitting the dataset into the Training set and Test set
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 0)
    
    # Feature Scaling
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    
    # Fitting Kernel SVM to the Training set
    classifier = SVC(kernel='rbf', random_state=0)
    classifier.fit(X_train, y_train)
    
    # Predicting the Test set results
    y_pred = classifier.predict(X_test)
    
    # Making the Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    
    # 应用K次交叉验证
    accuracies = cross_val_score(estimator=classifier, X=X_train, y=y_train, cv=10)  # cv: 将X划为几个等分进行测试
    print(accuracies.mean())
    print(accuracies.std())
    
    # 网状搜索
    parameters = [
        {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
        {'C': [1, 2, 3, 4], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 1]},
    
    ]
    
    grid_search = GridSearchCV(
        estimator=classifier,
        param_grid=parameters,
        scoring="accuracy",  # 最后将不同参数配置的结果以什么样的方式展示
        cv=10  # 将数据划分为几份
        # n_jobs=-1
    )
    
    grid_search = grid_search.fit(X_train, y_train)
    best_accuracy = grid_search.best_score_
    best_parameter = grid_search.best_params_
    

    输出结果:
    best_accuracy:0.9033333333333333
    best_parameter: {‘C’: 1, ‘gamma’: 0.7, ‘kernel’: ‘rbf’}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值