Supervisor Learning VS Unsupervisor Machine Learning

Supervisor Learning

Regression 回归

In regression tasks, the output variable is a continuous value. For example, predicting the price of a house based on features like the size of the house, location, age, etc. The goal of regression problems is to predict a quantity that is based on input variables.
在回归任务中,输出变量是一个连续值。例如,根据房屋的大小、位置、年龄等特征预测房屋的价格。回归问题的目的是预测一个数量,而这个数量是基于输入变量的。

Common regression algorithms include:

  • Linear Regression 线性回归
  • Polynomial Regression 多项式回归
  • Ridge Regression 岭回归
  • LASSO Regression (Least Absolute Shrinkage and Selection Operator Regression)
Classification 分类

In classification tasks, the output variable is a category label, meaning the goal of the model is to assign input data to predefined categories. For example, determining whether an email is spam based on its content. Classification problems typically involve categorizing input data into two or more categories.
在分类任务中,输出变量是类别标签,这意味着模型的目的是将输入数据分配到预定义的类别中。例如,根据电子邮件的内容判断这封邮件是否为垃圾邮件。分类问题通常涉及到将输入数据分类到两个或多个类别中。

Common classification algorithms include:

  • Logistic Regression 逻辑回归
  • Support Vector Machines (SVM) 支持向量机
  • Decision Trees 决策树
  • Random Forests 随机森林
  • K-Nearest Neighbors (KNN) K最近邻
  • Neural Networks 神经网络

Unsupervisor Learning

Clustering 聚类

group similar data points together

Clustering is one of the most common tasks in unsupervised learning, aiming to divide a dataset into multiple groups or “clusters” composed of similar objects. This method attempts to make data points within a cluster as similar as possible while keeping data points from different clusters as distinct as possible. Clustering is widely applied in market segmentation, social network analysis, grouping search results, and more.
聚类是无监督学习中最常见的任务之一,目的是将数据集分成由相似对象组成的多个组或“簇”。这种方法试图让簇内的数据点尽可能相似,而簇间的数据点尽可能不同。聚类广泛应用于市场细分、社交网络分析、搜索结果分组等领域。

Common clustering algorithms include:

  • K-Means Clustering K-均值聚类
  • Hierarchical Clustering 层次聚类
  • Density-Based Clustering (e.g., DBSCAN) 密度基础聚类
  • Gaussian Mixture Models 高斯混合模型
Dimensionality Reduction 降维

compress data using fewer numbers
Dimensionality reduction is another unsupervised learning task aimed at reducing the number of features in data while retaining as much of the original data’s important information as possible. Dimensionality reduction is particularly useful for dealing with high-dimensional datasets (i.e., the “curse of dimensionality”), and can be used for data visualization, improving the efficiency of learning algorithms, and more.
降维是另一种无监督学习任务,旨在减少数据中的特征数量,同时尽可能保留原始数据的重要信息。降维对于处理高维数据集(即“维数灾难”)特别有用,可以用于数据可视化、提高学习算法的效率等。

Common dimensionality reduction algorithms include:

  • Principal Component Analysis (PCA) 主成分分析
  • t-Distributed Stochastic Neighbor Embedding (t-SNE) t-分布随机邻域嵌入
  • Linear Discriminant Analysis (LDA) 线性判别分析
  • Autoencoders 自编码器
Anomaly Detection 异常检测

find unusual data points
The goal of anomaly detection is to identify outliers, events, or observations in a dataset that significantly differ from the majority of the data. Since anomalies (such as fraudulent activities, network intrusions, system faults, etc.) often constitute a small portion of the data and may not be clearly defined during data collection, traditional supervised learning methods (which require a large amount of labeled data) might not be applicable. Therefore, anomaly detection typically employs unsupervised learning approaches that do not require pre-labeled data to identify anomalies.
异常检测的目标是识别数据集中的异常项、事件或观测点,这些异常项在某种程度上与大多数数据显著不同。因为在许多实际应用中,异常数据(如欺诈行为、网络入侵、系统故障等)只占很小一部分,甚至在收集数据时还未明确知道异常是什么,所以常规的监督学习方法(需要大量已标记数据)在这里可能不太适用。因此,异常检测通常采用无监督学习方法来实现,这些方法不需要预先标记的数据即可识别异常。

  • 37
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值