数据科学专业词汇中英文对照表 1.0 v

悬臂梁断了

已于 2022-07-12 11:10:43 修改

阅读量2.5k

点赞数 2

分类专栏：数据科学中英文对照文章标签：数据科学专业词汇中英文对照

于 2019-11-13 14:57:16 首次发布

本文链接：https://blog.csdn.net/weixin_45865188/article/details/103049239

版权

中英文对照同时被 2 个专栏收录

3 篇文章

订阅专栏

数据科学

1 篇文章

订阅专栏

本文提供了一个全面的信息技术术语对照表，涵盖了从数据科学到人工智能、自然语言处理、机器学习等多个领域的专业词汇，帮助读者理解信息技术领域的核心概念。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Stop words	停顿词
Substantive information	单词携带的信息量
Relational information	关系性信息
Normalization	标准化处理
Performance	表现
Case folding	大小写的改变
Stemming	主要是单词的复数形式中或者指代所有格结果等单词中提取出相应的词干
Stem	词干
Named entity recognition, NER	命名实体识别
Cross-lingual Named entity Recognition	跨语言命名识别
Named entity extraction	命名实体提取
Term frequency, TF	词频
Inverse document frequency, IDF	逆文档频率
TFIDF, TF-IDF	是一种用于信息检索与数据挖掘的常用加权技术
Smooth	平滑
Feature learning, aka representation learning	表示学习
Predicative	预测
categorical	分类或者模式识别
Real-valued	实数
Visible neurons	输入层，输出层
Invisible neurons	隐层
Retina layer	视网膜层
Speech recognition	语音识别
Record	记录
Data set	数据集
Instance	示例
Sample	样本
Attribute	属性
Feature	特征
Attribute value	属性值
Attribute space	属性空间
Sample space	样本空间
Input space	输入空间
Feature vector	特征向量
Dimensionality	示例的维数
Learning	学习
Training	训练
Training data	训练数据
Training sample	训练样本
Training set	训练集
Hypothesis	假设
Ground-truth	有效的正确数据（或正确打标签的训练数据）
Learner	学习器
Label	标记
Label space	标记空间
Output space	输出空间
Classification	分类
Regression	回归
Binary classification	二分类任务
Positive class	正类
Negative class	反类
Multi-class classification	多分类任务
Testing	测试
Testing sample	测试样本
Clustering	聚类
Cluster	簇
Supervised learning	监督学习
Unsupervised learning	无监督学习
Generalization	泛化
Version space	版本空间
Scale-invariant feature transform, SIFT	尺度不变特征转换
Lemmatization	词形还原
Tokenization	分词
Conversion rate optimization, CRO	优化转换率
Garbage in, garbage out, GIGO	无用输入，无用输出
Tuple unpacking	元组拆包
Descriptor	描述符
Metaprogramming	元编程
Sequence type	序列类型
Abstract base class	抽象基类
Collection type	集合类型
Data model	数据模型
Sequence	序列
Mapping	映射
Set	集合
Str	字符
Bytes	字节序列
Locale	区域
First-order object	一等对象
Closure	闭包
Function decorator	函数装饰器
Callable	可调用
Function attribute	函数属性
Introspection	内省
Parameter annotation	参数注解
Class declaration	类声明
Reference	引用
Subroutine	子程序
Generator	生成器
Context manager	上下文管理器
Coroutine	协程
Property	特性
Class decorator	类装饰器
Metaclass	元类
Read-eval-print loop, REPL	交互式解释器
TDD	测试驱动开发
Constant width	等宽字体
Constant width bold	加粗等宽字体
Constant with italic	等宽斜体
Pythonic	Python风格
Magic method	魔术方法
Dunder-getitem	双下-getitem
Dunder method	双下方法
Slicing	切片
Datification	数据化
Datified	数据化的
Data-oriented	面向数据的
Integrated development environment, IDE	集成开发环境
Web integrated development environment, WIDE	网络集成开发环境
Data frame	数据结构
Comparable	可比
Slice of rows	行的切片
Filter	过滤器
Mask	掩码
In-line function	内联函数
Rank data	对数据进行排名
Pivoted table	数据透视表
Population	总体
Item	个体
Unit	单位
Sample	样本
Concept	概念
Measure	度量
Summary	样本汇总
Data preparation	数据准备
Descriptive statistics	描述统计学
Obtaining the data	获取数据
Parsing the data	解析数据
Cleaning the data	清理数据
Building data structures	构建数据结构
Mean	均值
Deviation	偏差
Histogram	直方图
Outlier	离群点/异常值
Sample statistics	样本统计量
Average	算术平均
Variance	方差
Standard deviation	标准方差
Median	中位数
Probability mass function, PMF	概率质量函数
Cumulative distribution function, CDF	累积分布函数
Skewness	偏度
Skew left	偏左
Pearson’s median skewness coefficient	皮尔逊中值偏度系数
Empirical distribution	经验分布
Continuous distribution	连续分布
Normal distribution	正态分布
Gaussian distribution	高斯分布
Closed-form	闭形式
Estimation	估计
Estimator	估计量
Mean squared error, MSE	均方差
Dimensionless	无量纲
Covariance	协方差
Pearson’s correlation	皮尔逊相关
Spearman’s rank correlation	斯皮尔曼秩相关
Frequentist approach	频率论方法
Bayesian approach	贝叶斯方法
Point estimate	点估计
Confidence interval	置信区间
Set estimate	集合估计
Central limit theorem	中心极限定理
Empirical sample distribution	经验样本分布
t-test	T检验
Truth	事实
Sample mean	样本均值
Sample distribution of the sample mean	样本均值抽样分布
Mean sampling distribution	均值抽样分布
Standard error	标准差
Bootstrapping, bootstrap	自助法
A plausible range of values	合理的取值范围
Statistically significant	统计显著性
Period	时期
One sided	单边检验
Two sided	双边检验
Bayesian reasoning	贝叶斯推理
Machine learning	机器学习
Supervised learning	监督学习
Unsupervised learning	无监督学习
Reinforcement learning	强化学习
Classification	分类
Regression	回归
Binary	二元
Multiclass	多元
Adaboost	自适应增强
Confusion matrix	混淆矩阵
Ground truth	正确的数据
Gold standard	黄金标准
True Positives, TP	真且正
False positives, FP	假且正
True negatives, TN	真且负
False Negatives, FN	假且负
Accuracy	准确率
Sensitivity	灵敏度
Specificity	特异度
Precision	查准率
Positive predictive value	正预测率
Negative predictive value, NPV	负预测值
Training set	训练集
Test set	测试集
In-sample error	样本内误差
Training error	训练误差
Out-of-sample error	样本外误差
Generalization error	泛化误差
Indicator function	指标函数
Model selection	模型选择
Cross validation	交叉验证
Leave-one-out	留一法
K-fold cross validation	K折交叉验证法
Test error	测试误差
Bias	偏差
Overfitting	拟合
Regularization	正则化
Validation	验证
Nested cross-validation	嵌套交叉验证
Model class/hypothesis space	模型类/假设空间
Problem model	问题模型
Minimization of an error function	误差函数的最小化
0-1 loss	0-1损失
Learning algorithm	学习算法
Support vector machine, SVM	支持向量机
Support vector	支持向量
Soft-margin SVM	软间隔SVM
Linear kernel	线性核
Polynomial kernel	多项式核
Radial basis function kernel	径向基函数核
Random forest, RF	随机森林
Response	响应
Covariate	协变量
Predictor	预测变量
Simple linear regression	简单线性回归
Multiple linear regression	多元线性回归
Logistic regression	逻辑（斯蒂）回归
Intercept	截距
Constant term	常数项
Ordinary least squares, OLS	最小二乘
Sum of squared errors of prediction, SSE	预测误差平方和
Positive-definite Hessian	带正定海赛
Computational complexity	计算复杂度
Sparse method	稀疏方法
Least absolute shrinkage and selection operator	最小绝对收缩和选择算子
Heat map	热图
Matricial image	矩阵图
Scatter plot	散点图
Linear regression score	线性回归分数
Clustering	聚类
Dimensionality reduction	降维
Principal component analysis, PCA	主成分分析
Novelty detection	新颖性检测
Similarity-based clustering	相似度的聚类
Dissimilarity matrix	相宜度的矩阵
Distance matrix	距离矩阵
Feature-based clustering	基于特征的聚类
Feature matrix	特征矩阵
Design matrix	设计矩阵
Minkowski distance	闵可夫斯基距离
Euclidean distance	欧几里得距离
Manhattan distance	曼哈顿距离
Max-distance	最大距离
Groundtruth	真实
Rand index	兰德指数
Rand measure	兰德度量
Coincidence	一致性
Silhouette coefficient	轮廓系数
Adjusted rand index	调整兰德指数
Homogeneity	同质性
Completeness	完整性
Bounded score	界限分数
Soft partition	软划分
Hard partition	硬划分
EM clustering	EM聚类
Partitional algorithm	划分算法
Spectral clustering	谱聚类
Hierarchical algorithm	分层算法
Agglomerative clustering	凝聚聚类
Inertia	惯性
Within-cluster sum-of-squares	簇内平方和
Simple Laplacian	普拉斯算子
Normalized Laplacian	归一化普拉斯算子
Generalized Laplacian	广义普拉斯算子
Dendrogram	树状图
Top-down	自上而下
Bottom-up	自下而上
Ward	区
Connectivity matrix	连通矩阵
Node	节点
Edge	边
Directed	有向的
Undirected	无向的
Directed graph	有向图
Undirected graph	无向图
In-degree	入度
Out-degree	出度
Strength	强度
Weight	权重
Path	路径
Connected	连通的
Fully connected	全连通的
Connected component	连通分量
Subgraph	子图
Small-world	小世界
Scale-free	无标度
Small-world phenomenon	小世界现象
Hub	枢纽
Degree centrality	度中心性
Betweenness centrality	中介中心性
Closeness centrality	紧密中心性
Eigenvector centrality	特征向量中心性
Betweenness	中介性
Current flow betweenness centrality	流中介中心性
Random walk betweenness centrality	随机游走中介中心性
Venn	维恩图
Euler	欧拉图
Louvain	鲁汶方法
Content-based filtering, CBF	基于内容的过滤
Collaborative filtering, CF	协作过滤
Cold-start	冷启动
User-based	基于用户的
Item-based	基于物品的
Item-item	物品-物品
Root mean square error, RMSE	方均根误差
Offline	离线
Online	在线
A-B testing	A-B测试
Natural language toolkit, NLTK	自然语言工具箱
Regular expression, RE	正则表达式
Sparse feature space	稀疏特征空间
Posterior high-dimensional	后验高维
Bag of words，BoW	词袋
Vector space model	向量空间模型
Term frequency-inverse document frequency, TF-IDF	词频-逆文档频率
Uni-gram	一元组
Digram	连字
Tri-gram	三元组
Bi-gram	二元组
Naïve bayes	朴素贝叶斯
Support vector machine, SVM	支持向量机
Cluster	聚类/簇
Direct view	直接视图
Load-balanced view	负载均衡视图
Confirmatory data analysis, CDA	验证性数据分析
数据科学(DS) 中英文对照表 1.0v