scikit-learn Tutorials

An introduction to machine learning with scikit-learn

Machine learning: the problem setting

ML中数据若是多维特征,则被称为 multivariate data,即多个feature。

Loading an example dataset

带有一些小数据集如iris。dataset 是类字典对象,其包含了数据体和数据的元信息。
如iris.data ,iris.target
值得注意的是,所有数据都被表示为(n_samples, n_features)的2D数据。这意味着图片都被展平为向量了。如果需要载入外部数据集,需要用7.4节的东西

Learning and predicting

使用estimator做分类:estimator必须有fit方法和predict方法。SVC是一个分类的例子:

>>> from sklearn import svm
>>> clf = svm.SVC(gamma=0.001, C=100.)

Conventions

sklearn遵循一些传统使得模型更具预测性:

  1. Type casting:将数据默认转化为float64

下面代码预测结果是str类型的,这是因为传入的target就是str:

>>> clf.fit(iris.data, iris.target_names[iris.target])
SVC()

>>> list(clf.predict(iris.data[:3]))
['setosa', 'setosa', 'setosa']
  1. Refitting and updating parameters:可以用set_params()方法改变超参数
  2. Multiclass vs. multilabel fitting:可以实现标签分类。传入什么样子的target,预测就是

A tutorial on statistical-learning for scientific data processing

Statistical learning: the setting and the estimator object in scikit-learn

Datasets

数据都是二维表示的,第一个轴是sample 轴,第二个轴是feature轴。如果输入数据不是这样的形状,则必须经过预处理将其转换为这种形状.
iris.DESCR详细描述了数据信息

Estimators objects

estimator是核心对象,是一个可以从数据中学习,或者提取、转换特征的对象:the main API implemented by scikit-learn is that of the estimator. An estimator is any object that learns from data; it may be a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data.
查看其参数字典:

 estimator.estimated_param_ 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Mastering Machine Learning with scikit-learn - Second Edition by Gavin Hackeling English | 24 July 2017 | ASIN: B06ZYRPFMZ | ISBN: 1783988363 | 254 Pages | AZW3 | 5.17 MB Key Features Master popular machine learning models including k-nearest neighbors, random forests, logistic regression, k-means, naive Bayes, and artificial neural networks Learn how to build and evaluate performance of efficient models using scikit-learn Practical guide to master your basics and learn from real life applications of machine learning Book Description Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model. This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn's API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model's performance. By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach. What you will learn Review fundamental concepts such as bias and variance Extract features from categorical variables, text, and images Predict the values of continuous variables using linear regression and K Nearest Neighbors Classify documents and images using logistic regression and support vector machines Create ensembles of estimators using bagging and boosting techniques Discover hidden structures in data using K-Means clustering Evaluate the performance of machine learning systems in common tasks About the Author Gavin Hackeling is a data scientist and author. He was worked on a variety of machine learning problems, including automatic speech recognition, document classification, object recognition, and semantic segmentation. An alumnus of the University of North Carolina and New York University, he lives in Brooklyn with his wife and cat. Table of Contents The Fundamentals of Machine Learning Simple linear regression Classification and Regression with K Nearest Neighbors Feature Extraction and Preprocessing From Simple Regression to Multiple Regression From Linear Regression to Logistic Regression Naive Bayes Nonlinear Classification and Regression with Decision Trees From Decision Trees to Random Forests, and other Ensemble Methods The Perceptron From the Perceptron to Support Vector Machines From the Perceptron to Artificial Neural Networks Clustering with K-Means Dimensionality Reduction with Principal Component Analysis

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值