Applied Machine Learning Notes

Applied Machine Learning



A dataset that is intended to be analyzed by machine learning method is supposed to have Feature (X) and Target Value/Label (y)

Traning and test sets are needed for a given dataset.
Model fitting will produce a ‘training model’ by using the training set of data. Then, we can evaluate the model based on the training model.

There are two types of problems in machine learning: classfication and regression. Both catrgories take a set of traning instances and learn a mapping to produce a target value. Classification return discret value, while Regression return continuous value.

Overfitting and Underfitting

Generalization ability refers to an algorithm’s ability to give accurate predictions for new previously unseen data.

Overfit Models are too complex. Not likely to generalize well to new examples.
Underfit Models are too simple. Not do well on training data.

Supervised Learning

k-Nearest Neighbor

  1. Find the most similar (closest) instances (in X_train) to the x_test
  2. Get the corresponding y_label of the X_train
  3. Predict the label for x_test by combing label acquired from the 2nd step.

Beyond classification, knn can be used for regression. Simply find the closest output value corresponding to the features. 在这里插入图片描述

Linear Model

It is a sum of weighted variables that predicts a target output calue given an input data instance.
(eg: predicting housing prices, (ax + by + cz = target_value)

Input feature vector: x = (x0, x1, x2…)
Parameters to estimate
w = (w0,w1,w2…) slope
b = (b0,b1,b2…) constant bias

Least Squares

Finds the w and b that minimizes the mean squared error of the model: the sum of squared differences between predicted target and actual target values.

No parameters to control model complexity.

Parameters (w,b) are estimated from training data. The learning algorithm would want to minimize a loss function.

Loss function in this case is Sum of squared differences (RSS) over the training data between predicted target and actual target values.

Ridge Regression

L2 Penalty: In addition to the least-squares criterion, add an additional parameter to regularize the w to prevent overfitting of the model. + alpha*sum(w^2). Sum of w entries is minimized to reduce the complexity of the model. Higher alpha means more regularization and simpler models.

Regularization: prevents overfitting by restricting the model and reduce its complexity.

Normalization: All features are in the same scale so that weight on the regularization penalty is fair in this case. Also could lead to faster convergence in learning. MinMax scaling can do this job (compute the min value and max value, transform a given feature xi to a scaled version).

Lasso Regression

L1 Penalty: Instead of sum of squares of w, lasso regression use absolute value of the coefficients.

Lasso vs Ridge: Use Ridge when there are many small/medium sized effect. Use lasso when there are only a few varibales.

Polynomial Features with Linear Regression

Generate new features consisting of all polynomial combinations of the original two features (x0, x1)
(x0, x1, x0^2, x0*x1, x1^2)

Logistic Regression

Also has linear function
Use logistic to convert the function to discrete value. (-1,+1)

y = 1/(1+exp[-(function)]

Linear Classifiers: Linear Vector Support Machines

f(x,w,b) = sign(w*x+b)
w (weight)
x (input value)

Results could be positive or negative.

w is a line that seperate the two groups of data.
We want the maximum margin (line with the maximum distance between the two groups of data) classifer (Linear Vector Support Machines)

The points that used to construct the line is called the support vector.

C is the degree of regularization. Larger C means less regularization (more specific).

To predict multiple class
The sklearn use binary classfication on each category one by one. Run each categories against all other categories (One line separate one category and all other).

Kernelized Support Vector Machines
Use a kernel to transform the data for later classification. (eg: radial basis function).
K(x,x’) = exp[-gamma*(x-x’)^2]
Larger gamma means points have to be very close to be similar.

Cross Validation

Use different sets of training sets and get the average score of the performance.

eg: Fold 3: split the data into three sets. Use each set as the test set once, so the rest sets are the training sets. So there are in total three folds and three test sets. Normally use a stratified cross validation strategy to ensure each set simulate the original data structure (types of target value)

Decision Trees

Use a binary tree to regress or classify data. Classify flower type based on flower length and width.

Yes/No -> Yes/No -> Groups of data

Pure Nodes: pure groups of data
Mixed Nodes: might need further decision

Need to limit the depth/nodes of the tree/min num to split to prevent overfitting.

Scala:Applied Machine Learning by Pascal Bugnion English | 23 Feb. 2017 | ISBN-13: 9781787126640 | 1843 Pages | EPUB/PDF (conv) | 33.15 MB Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features. About This Book Build functional, type-safe routines to interact with relational and NoSQL databases with the help of the tutorials and examples provided Leverage your expertise in Scala programming to create and customize your own scalable machine learning algorithms Experiment with different techniques; evaluate their benefits and limitations using real-world financial applications Get to know the best practices to incorporate new Big Data machine learning in your data-driven enterprise and gain future scalability and maintainability Who This Book Is For This Learning Path is for engineers and scientists who are familiar with Scala and want to learn how to create, validate, and apply machine learning algorithms. It will also benefit software developers with a background in Scala programming who want to apply machine learning. What You Will Learn Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive Solve big data problems with Scala parallel collections, Akka actors, and Apache Spark clusters Apply key learning strategies to perform technical analysis of financial markets Understand the principles of supervised and unsupervised learning in machine learning Work with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquet Construct reliable and robust data pipelines and manage data in a data-driven enterprise Implement scalable model monitoring and alerts with Scala In Detail This Learning Path aims to put the entire world of machine learning with Scala in fron




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


