机器学习名词解释

penguin027

已于 2023-03-13 21:34:51 修改

阅读量380

点赞数

文章标签：机器学习人工智能算法

于 2023-02-23 22:39:50 首次发布

本文链接：https://blog.csdn.net/penguin027/article/details/129189684

版权

文中介绍了基于Spark的数据预处理和特征提取技术，如方差阈值选择和PCA，用于电力负荷分析。采用线性回归和随机森林构建的回归模型进行预测，并利用Flask创建Web应用展示结果。此外，还展示了使用ResNext网络在SCUT-FBP5500数据集上训练的人脸美丽预测模型，其相关性达到0.88。在CCF大数据与计算智能大赛中，利用BILSTM和NaiveBayes对产品评论进行情感分析，取得约63.5%的准确性。

摘要由CSDN通过智能技术生成

my

序号	概念	英文	解释
1	损失函数	Loss Function	Loss function is used to show the difference between the predicted value and the true value, which can be used as an evaluation index of machine learning models.	损失函数就是用来表现预测值与真实值之间的差距，可以作为机器学习模型的评价指标
2	激活函数	Activation Function	The activation function is used to add nonlinear factors to improve the expression ability of the neural network model and solve the problem that the linear model cannot solve.	激活函数是用来加入非线性因素的,提高神经网络对模型的表达能力,解决线性模型所不能解决的问题 eg.Relu,Sigmod,tanh
3	ROC		In classification problems, the ROC curve represent the ratio between true positive rate and false positive rate under different thresholds.	在分类问题中，ROC曲线表示在不同阈值下，真正率与假正率之间的比例
4	AUC	Area Under ROC Curve	The area under the ROC curve enclosed by the axis	ROC曲线下与坐标轴围成的面积
5	反向传播	Back Propagation	Back propagation is a common method used to train artificial neural networks.This method calculates the gradient of the loss function for all weights in the network, and then updates the parameters with the optimization method.	反向传播是用来训练人工神经网络的常见方法。该方法对网络中所有权重计算损失函数的梯度，再配合优化方法来更新参数。
6	基学习器	Base Learner	Ensemble learning completes the learning task by constructing and combining multiple learners. If the ensemble contains only individual learners of the same type, such an ensemble is a homogeneous ensemble, and the individual learners are also called base learners, and the corresponding learning algorithm is also called base learning algorithm.	集成学习通过构建并结合多个学习器来完成学习任务，如果集成中只包含同种类型的个体学习器，例如决策树集成中全都是决策树，这样的集成是‘同质’（homogeneous）的，同质集成中的个体学习器又称‘基学习器’，相应的学习算法又称基学习算法。
7	基准	Baseline	The concept of baseline is a reference for the improvement of the algorithm, which is equivalent to a base model, which can be used as a benchmark to compare the effectiveness of the improvement of the model.	Baseline这个概念是作为算法提升的参照物而存在的，相当于一个基础模型，可以以此为基准来比较对模型的改进是否有效。
8	偏差	Bias	Bias measures the deviation degree between the expected results of the learning algorithm and the real results, that is, it describes the fitting ability of the learning algorithm itself	偏差度量了学习算法的期望预期与真实结果的偏离程度，即刻画了学习算法本身的拟合能力
9	自主采样	Bootstrap Sampling	A subset of the original data set was obtained by performing k sampling with put back. This subset was used as the training set, and the data outside the set was used as the validation set	进行k次有放回的抽样，得到原始数据集的一个子集，将这个子集作为训练集，该集合外的数据作为验证集
10	类别不平衡	Class-Imbalance	Category imbalance problem refers to the situation where the number of training samples of different categories differs greatly in a classification task.	类别不平衡问题指分类任务中不同类别的训练样本数目差别很大的情况。
11	聚类	Clustering	The data in the dataset were divided into several clusters, so that the samples within each cluster were as similar as possible and the samples between different clusters were as different as possible.	把数据集中的数据分为若干个簇，使得每个簇内的样本尽可能相似，不同簇之间的样本差异尽可能大。
12	收敛	Convergence
13	卷积	Convolution	Is a linear operation that uses a convolution kernel to slide over an image and perform a weighted sum of different pixels, which is used to extract features of the image	是一种线性计算，使用卷积核在图像上滑动，对不同像素点进行加权求和，用于提取图像特征。

1、我们实现了一个电力负荷分析系统，该系统基于 Spark 技术对数据进行预处理，以及运用方差阈值选择及 PCA 算法进行特征提取。而后选用线性回归、随机森林回归，构建回归模型进行数据预测。并基于 Flask 搭建了 Web 应用框架，可以通过界面查看数据集和算法结果。

We implemented a power load analysis system, the system is based on the Spark technology for data preprocessing, use the variance threshold and PCA algorithm to select feature .Then we choose linear regression, random forests regression to build regression model for data prediction. And based on the Flask to build a Web application framework, through the interface we can view data set directly and the result of the algorithm.

2、使用 ResNext 网络在 SCUT-FBP5500 数据集上训练人脸美丽预测模型，模型预测结果的相关性达 0.88。在这过程中我学会了使用paddle平台，对resnext网络有了一定了解。

I use ResNext network to train the facial beauty prediction model on the SCUT-FBP5500 dataset, and the correlation of the model prediction results reached 0.88. During this process, I learned to use paddle platform and have a certain understanding of resnext network.

3、我曾经参加过CCF大数据与计算智能大赛，比赛题目是产品评论观点提取。我们的任务是对银行评论文本进行语义标注并判断情绪倾向，参赛结果的准确率约为63.5%。在这个过程中，我对BILSTM和NaiveBayes有了一定的了解。

I once participated in a CCF Big Data and Computational Intelligence competition, and the title of the competition was Extraction of product review opinions. Our task was to conduct semantic annotation of bank review texts and judge emotional tendencies, and the accuracy rate of the competition results was about 63.5%.In this process, I have a certain understanding of BILSTM and NaiveBayes.