Machine Learning note—briefing

Basic Machine Learning Problems

● Supervised Learning: You have labelled data for computer to learn from
○ Regression
○ Classification
● Unsupervised Learning: You don’t have labelled data, but you want to find
patterns in the data
○ Clustering / Dimensionality Reduction

Linear Regression

In the simplest case, we can assume that the relationship between the features and the target is linear:
y =a + bX
In the equation above, y is the target, X is the feature, a is the intercept, and b is the weight of the feature
Using ordinary least squares method, we can estimate a and b in the equation
y = b0 + b1x1 + b2x2 + … + bnxn
This is still a linear regression model, sometimes called multiple linear regression
b0 is called the bias term, while b1 to bn are the weights of the features


● In classification, we are interested in putting each input sample into two (or more) pre-defined classes
● In other words, the target variable y is discrete
● Some common algorithms for classification:
○ Logistic regression
○ Support vector machines
○ Decision Trees
○ K-nearest-neighbour (kNN)
● Some regression tasks can be simplified to classification tasks

Logistic Regression

we can apply a transformation to the output of linear regression: logistic function or **sigmoid function
logistic function or **sigmoid function
Its value tends to 1 if z tends to +∞, and tends to 0 if z tends to −∞

Decision Trees

● Decision trees are constructed by finding conditions to split the dataset into smaller subsets
● Decision trees can also be used to perform regression (thus the term CART:
Classification And Regression Trees)
● Decision trees are usually vulnerable to overfitting (more on this later), thus
we usually have to control the depth of a tree

Choosing ML Algorithms

Reference: Choosing ML Algorithms.

Model Complexity

● A complex model captures complex relationship between X and y, but it is also more likely to pick up noise → overfitting
● A simple model is easy to interpret, but may not be able to capture the true relationship between X and y → underfitting

Splitting Your Dataset

● It is usually advised that we have three splits of the dataset:

  1. training set: for training your model(s)
  2. validation/development set: for tuning your model’s hyperparameters
  3. test/holdout set: for testing the performance of your model
  4. imbalanced dataset:stratified sampling
  5. K-fold cross validation


● When evaluating the performance of a model, we need to have:
○ ground truths: the correct answer / the true labels of the inputs
○ metric: a measure of how good the predictions are compared to the ground truths

Metrics for Regression - MAE(Mean Absolute Error )


Metrics for Regression - RMSE(Root Mean Squared Error)


Metrics for Classification

○ Accuracy
○ True/False positives/negatives
○ Precision and Recall
○ Area Under the ROC Curve

Top 20 Python libraries for data science

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0
NUDT (National University of Defense Technology) is a research university located in Changsha, Hunan Province, China. It was founded in 1953 and is managed by the Ministry of National Defense. The university has a strong focus on science and technology, particularly in the fields of national defense and security, and is known for its research in high-performance computing, cryptography, and artificial intelligence. It has a number of key research institutes, including the National Key Laboratory of Parallel and Distributed Processing, the National Key Laboratory of Science and Technology on Information Security, and the National Engineering Laboratory for Big Data Analytics and Applications. NUDT has a strong partnership with the Chinese military and is known for its contributions to China's space program and its development of military technology. It has also been involved in a number of international collaborations, including partnerships with the United States, Australia, and Canada. The university offers undergraduate, graduate, and doctoral programs in a range of fields, including computer science, electrical engineering, and mathematics. It has a large student body, with over 30,000 students enrolled, and a faculty of over 2,500. NUDT has been ranked among the top universities in China and globally, and is considered a leading institution for research in science and technology. Its research has received numerous awards and honors, including the National Science and Technology Progress Award, the National Technological Invention Award, and the State Natural Science Award.


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


