scikit-learn
Douhh_sisy
这个作者很懒,什么都没留下…
展开
-
sklearn实战:Kaggle自行车租赁预测(岭回归,支持向量回归,随机森林回归)
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsdf_train=pd.read_csv('kaggle_bike_competition_train.csv',header = 0)df_train.head(10)...原创 2018-06-05 20:11:05 · 2739 阅读 · 1 评论 -
sklearn实战:KMeans算法
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npfrom sklearn.datasets import make_blobsX, y = make_blobs(n_samples=200, n_features=2, cen...原创 2018-06-10 14:51:10 · 1132 阅读 · 0 评论 -
sklearn实战:文档分类预测(朴素贝叶斯算法)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npfrom time import timefrom sklearn.datasets import load_filesprint("loading train dataset ...")t = time()news_train = load_...原创 2018-06-09 21:56:11 · 1538 阅读 · 0 评论 -
sklearn实战:SVM(线性核函数,多项式核函数,高斯核函数比较)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npdef plot_hyperplane(clf, X, y, h=0.02, draw_sv=True, title='hype...原创 2018-06-08 14:25:55 · 23194 阅读 · 0 评论 -
kaggle:预测泰坦尼克号幸存者(决策树算法,网格搜索模型参数调优)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npimport pandas as pddef read_dataset(fname): # 指定第一列作为行索引 data = pd.read_csv(fname, index_col=0) #列索引为csv文件第一行 ...原创 2018-06-07 22:13:53 · 3361 阅读 · 0 评论 -
Kaggle:Predicting a Biological Response
地址戳我 # 基本CSV读写操作 # 我们需要读取给定的训练数据,再进行后续的数据(特征等)处理 def read_data(file_name): f = open(file_name) #ignore header f.readline() samples = [] target =...转载 2018-06-07 15:58:20 · 616 阅读 · 0 评论 -
Kaggle:San Francisco Crime Classification
比赛地址https://www.kaggle.com/c/sf-crime 这里用logistic regression来完成这个预测问题。 # 基本CSV读写操作 # 我们需要读取给定的训练数据,再进行后续的数据(特征等)处理def read_data(file_name): f = open(file_name) #ignore header ...转载 2018-06-07 15:54:12 · 850 阅读 · 0 评论 -
sklearn实战:乳腺癌检测(逻辑回归算法)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as np# 载入数据from sklearn.datasets import load_breast_cancercancer = load_breast_cancer()X = cancer.datay = cancer.targetprint('...原创 2018-06-07 15:35:31 · 8284 阅读 · 1 评论 -
sklearn实战:房价预测(线性回归)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npfrom sklearn.datasets import load_bostonboston = load_boston()X = boston.datay = boston.targetX.shape(506, 13)X[0]...原创 2018-06-07 10:15:55 · 2131 阅读 · 0 评论 -
sklearn实战:使用线性回归算法拟合正弦函数
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npn_dots = 200X = np.linspace(-2 * np.pi, 2 * np.pi, n_dots)Y = np.sin(X) + 0.2 * np.random.rand(n_dots) - 0.1X = X.reshape(-...原创 2018-06-07 10:14:20 · 5111 阅读 · 0 评论 -
sklearn实战:使用knn进行回归拟合
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as np# 生成训练样本n_dots = 40X = 5 * np.random.rand(n_dots, 1)y = np.cos(X).ravel()# 添加一些噪声y += 0.2 * np.random.rand(n_dots) - 0.1...原创 2018-06-06 20:21:22 · 4804 阅读 · 0 评论 -
sklearn实战:糖尿病预测(knn算法)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npimport pandas as pd# 加载数据data = pd.read_csv('datasets/pima-indians-diabetes/diabetes.csv')print('dataset shape {}'.for...原创 2018-06-06 20:19:10 · 7331 阅读 · 4 评论 -
sklearn实战:使用knn算法进行分类及可视化
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npfrom sklearn.datasets.samples_generator import make_blobs# 生成数据centers = [[-2, 2], [2, 2], [0, 4]]X, y = make_blobs(n_sample...原创 2018-06-06 18:10:54 · 8853 阅读 · 0 评论 -
sklearn实战:对文档进行聚类分析(KMeans算法)
%matplotlib inlineimport matplotlib.pyplot as pltimport numpy as npfrom time import timefrom sklearn.datasets import load_filesprint("loading documents ...")t = time()docs = load_files('dat...原创 2018-06-10 14:54:22 · 3140 阅读 · 1 评论