游戏行业数据类丛书_理论丛书:高维数据101

游戏行业数据类丛书

起源 (Origins)

“The purpose of this work is to provide an introduction to mathematical theory of multi-stage decision processes. Since these constitute a somewhat formidable set of terms we have coined the term ‘dynamic programming’ to describe the subject matter. […] Each decision may be thought of as a choice of a certain number of variables which determine the transformation to be employed. Each sequence of choices […] is a choice of a larger set of variables. By lumping all these choices together, we ‘reduce’ the problem to a classical problem of determining the maximum of a given function. […] The determination of this maximum is quite definitely not routine when the number of variables is large. All this may be subsumed under the heading ‘the curse of dimensionality.’ ”

“这项工作的目的是介绍多阶段决策过程的数学理论。 由于这些构成了一些难以理解的术语,因此我们创造了术语“动态编程”来描述主题。 […]每个决定都可以视为对确定要采用的转换的一定数量变量的选择。 每个选择序列[…]是一组较大的变量的选择。 通过将所有这些选择集中在一起,我们将问题“简化”为确定给定函数最大值的经典问题。 […]当变量数量很大时,确定此最大值绝对不是常规操作。 所有这些都可以包含在“维数的诅咒”标题下。

Richard Bellman, 1956 [1]

理查德·贝尔曼(Richard Bellman),1956年 [1]

Many texts on statistics, machine learning, and computer science reference the aforementioned “curse of dimensionality, ” a phrase most frequently attributed to applied mathematician Richard Bellman. Bellman was a prolific writer who created the field of dynamic programming, researching numerical solutions to partial differential equation systems. He authored at least 621 papers and 41 books.[2] It is not clear if Bellman developed the phrase independently or borrowed it from another source, but the first literature reference to the curse of dimensionality appears to be in the preface to his 1956 book Dynamic Programming.[1] As can be deduced from the excerpt above, a full description on the problem was not originally delineated.

有关统计,机器学习和计算机科学的许多文章都提到了前面提到的“维数诅咒”,这是最常被应用数学家理查德·贝尔曼(Richard Bellman)引用的短语。 贝尔曼(Bellman)是一位多产的作家,他创建了动态程序设计领域,研究偏微分方程组的数值解。 他撰写了至少621篇论文和41本书。 [2]尚不清楚Bellman是独立开发该短语还是从其他来源借用该短语,但是有关维数诅咒的第一个文献参考似乎是在他1956年出版的《 动态编程》的序言中。 [1]从上面的摘录中可以得出,最初并未描述该问题的完整描述。

In 1997, the curse of dimensionality was described as the observation that increasing dimension of Markovian decision problems exponentially increased the time and space required to compute approximate solutions. Formal study of this ‘curse’ was also described as problem tractability in information-based complexity theory and computational complexity theory.[3]

1997年,维数的诅咒被描述为观察到马尔可夫决策问题的维数呈指数增长,从而增加了计算近似解所需的时间和空间。 这个“诅咒”的正式研究在信息化的复杂性理论和计算复杂性理论也被描述为问题的易处理性[3]

Generally, this phrase has come to refer to the phenomenon that increasing data dimension exponentially increases the volume of space data can occupy. Accordingly, in high dimensional spaces, available data becomes sparse. In statistics and machine learning, we might also refer to these dimensions as features or attributes.

通常,该短语是指增加数据维数成指数增加空间数据可占用空间的现象。 因此,在高维空间中,可用数据变得稀疏。 在统计和机器学习中,我们也可以将这些维度称为特征或属性。

The curse of dimensionality example often cited is the hyper-sphere inscribed in a hyper-cube scenario. In two dimensions, the area of the unit circle inside a unit square comprises 78.5% of the space. In three dimensions, the volume of a unit sphere comprises 52.4% of the space in a unit cube. In four dimensions, this shared space decreases to 30.8%. The shared space can be computed in n-dimensions with the following formula. We see in the figure to the right that as dimension count increases, this shared area converges to 0. In essence, the “corners” or “edges” of the hyper-cube grow at a rate much faster than that of the hyper-sphere.

经常引用的维数诅咒示例是超立方体场景中的超球面。 在二维上,单位正方形内单位圆的面积占空间的78.5%。 在三个维度上,单位球的体积占单位立方体中空间的52.4%。 在四个维度上,此共享空间减少到30.8%。 共享空间可以使用以下公式按n维计算。 我们在右图中看到,随着维数的增加,该共享区域收敛到0。本质上,超立方体的“角”或“边缘”以比超球体快得多的速率增长。 。

Image for post
Image for post

If the volume in the center of the hyper-cube approaches 0 as n increases without bound, then the volume must be in the corners. In fact, if we take the squared magnitude of a vector from the origin to any point p in the hypercube as a random variable, this random variable has a chi-square distribution with n degrees of freedom.[15] More so, it can be shown that the volume of the hypercube is actually concentrated near the surface of the sphere with radius 𝜎√n.

如果当n无限增大时,超立方体中心的体积接近0,则该体积必须在角落。 实际上,如果将从原点到超立方体中任意点p的向量的平方大小作为随机变量,则此随机变量的卡方分布为n个自由度。 [15]此外,可以证明超立方体的体积实际上集中在半径为𝜎√n的球体表面附近。

The wide ranging implication of these geometric explorations is that it becomes increasingly difficult to adequately sample the n-dimensional space for large values of n due to ever-increasing sparsity. So how can we deal with sparsity in high dimensional data?

这些几何探索的广泛含义是,由于稀疏性的不断增加,要对大的n值的n维空间进行充分采样变得越来越困难。 那么,如何处理高维数据中的稀疏性呢?

高维空间中的欧氏距离缺陷 (Euclidean Distance Shortcomings in High Dimensional Space)

The classic Euclidean distance we are familiar with is the magnitude of the vector between points p and q in n-dimensional space. We can represent it as the Euclidean norm, or the L2 norm, of the displacement vector between the two points.

我们熟悉的经典欧几里得距离是n维空间中点p和q之间向量的大小。 我们可以将其表示为两点之间位移矢量的欧几里得范数或L2范数。

Image for post

Using this definition, consider the classic nearest neighbor problem defined as, “given a collection of data points in m-dimensional space, finding the data point p out of the collection closest to the query point q. In 1998, Beyer, Goldstein, Ramakrishnan, and Shaft demonstrated that under broad conditions (not iid), as m increases, the distance to the nearest neighbor approaches the distance to the farthest neighbor. In other words, the contrast between the near and far points becomes indistinct. This finding extends to the k-nearest neighbor version of the problem.[16]

使用这个定义,考虑经典的最近邻问题,定义为“给定m维空间中的数据点集合,从最接近查询点q的集合中找到数据点p。 1998年,Beyer,Goldstein,Ramakrishnan和Shaft证明,在较宽的条件(非iid)下,随着m的增加,到最近邻居的距离接近到最远邻居的距离。 换句话说,近点和远点之间的对比度变得模糊。 这一发现扩展到该问题的k近邻版本。 [16]

Image for post

This result holds true when the following assumption is met: variance of the distance distribution divided by distance length approaches zero.

当满足以下假设时,此结果成立:距离分布的方差除以距离长度接近零。

Image for post

One might conclude the Euclidean distance between points becomes meaningless with increasing dimension; however, that would not adequately represent the authors’ simulation findings. They find when there are highly correlated clusters present in the high dimensional space, and the query point is in such a cluster, the contrast does not shrink to zero. Correlations between features gives us material to work with.

一个人可能会得出结论,随着尺寸的增加,点之间的欧几里得距离变得毫无意义。 但是,这不足以代表作者的模拟发现。 他们发现何时在高维空间中存在高度相关的聚类,并且查询点在这样的聚类中,对比度不会缩小为零。 功能之间的相关性为我们提供了可使用的材料。

If high dimension data has an underlying low-dimensional structure, the lower dimensional structure can be extracted and explored. This is the concept behind traditional feature extraction and feature engineering methods. However, please note this is NOT the same thing as feature selection. Feature selection is concerned with choosing a subset of the original set of available features for use in modeling. Examples of feature selection include random forests methods, backwards/forwards/stepwise/purposeful selection processes, best subset selection, etc. Meanwhile, feature extraction may employ linear combinations, projections, and transformations to extract new lower dimensional sets of features with which to model.

如果高维数据具有底层的低维结构,则可以提取和探索低维结构。 这是传统特征提取和特征工程方法背后的概念。 但是,请注意,这与功能选择不同。 特征选择与选择用于建模的原始一组可用特征的子集有关。 特征选择的示例包括随机森林方法,后退/前进/逐步/有目的的选择过程,最佳子集选择等。同时,特征提取可以采用线性组合,投影和变换来提取新的低维特征集,以进行建模。

Dimensionality reduction is the goal of linear methods such as principal component analysis, factor analysis, linear discriminant analysis, and independent components analysis as well as non-linear methods such as autoencoders, isometric mapping (IsoMap), locally linear embedding (LLE), Laplacian eigenmaps, t-distributed stochastic neighbor embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). The linear methods tend to take into account global variation of the data points while the non-linear methods tend to map local behavior to underlying manifolds that describe non-linear behavior. It is my hope the following brief notes on such methods help organize concepts behind different forms of dimensionality reduction methods. High level overviews often elucidate the mechanics of implementation.

降维是线性方法(例如主成分分析,因子分析,线性判别分析和独立成分分析)以及非线性方法(例如自动编码器,等距映射(IsoMap),局部线性嵌入(LLE),拉普拉斯算子)的目标特征图,t分布随机邻居嵌入(t-SNE)和均匀流形逼近和投影(UMAP)。 线性方法倾向于考虑数据点的全局变化,而非线性方法倾向于将局部行为映射到描述非线性行为的基础流形 。 我希望以下有关此类方法的简要说明有助于组织不同形式的降维方法背后的概念。 高层次的概述通常会阐明实施的机制。

降维方法(线性,全局信息) (Dimension Reduction Methods (Linear, Global Information))

主成分分析(PCA)(1901) [4] (Principal Component Analysis (PCA) (1901) [4])

  • GOAL: Minimize information loss with data reduction technique

    目标:利用数据缩减技术最大程度地减少信息丢失
  • Step 1: Eigenvalue decomposition of the variance-covariance matrix of centered, scaled observations

    步骤1:集中缩放的观测值的方差-协方差矩阵的特征值分解
  • OR Step 1: Singular value decomposition of design matrix

    或步骤1:设计矩阵的奇异值分解
  • Defines new orthogonal coordinate system of uncorrelated dimensions

    定义不相关尺寸的新正交坐标系
  • Simplest form of eigenvector-based multivariate analysis

    基于特征向量的最简单形式的多元分析

因素分析( 1904 )[5] (Factor Analysis (1904) [5])

  • GOAL: Determine underlying, unobserved latent variables among observed, indirect variables

    目标:在观察到的间接变量中确定潜在的未观察到的潜在变量
  • Assumes observed variables are linear combinations of latent variables

    假设观察到的变量是潜在变量的线性组合
  • Builds upon PCA. Assumes raw PCA methods yield inflated component loadings with high variance due to error

    以PCA为基础。 假设原始PCA方法会由于误差而产生虚高的虚假组件加载
  • Can be used as exploratory factor analysis or as confirmatory factor analysis

    可用作探索性因子分析或确认性因子分析

线性判别分析( 1948 )[6] (Linear Discriminant Analysis (1948) [6])

  • GOAL: Find linear combination of 2 or more dimensions to characterize classes of events as one new dimension

    目标:找到2个或多个维度的线性组合,以将事件类别表征为一个新维度
  • Generalized version of Fisher linear discriminant

    Fisher线性判别式的广义形式
  • Closely related to logistic regression

    与逻辑回归密切相关

独立成分分析( 1991 )[7] (Independent Component Analysis (1991) [7])

  • GOAL: Separate multivariate signals into additive subcomponents

    目标:将多元信号分离为加性子成分
  • Assume subcomponents are non-Guassian signals that are independent from one another

    假设子成分是彼此独立的非高斯信号

降维方法(非线性,主要是本地信息) (Dimension Reduction Methods (Non-Linear, Primarily Local Information))

自动编码器(最早参考1987) [8] (2015) [9] (Autoencoders (Earliest reference 1987) [8] (2015) [9])

  • GOAL: Utilize feed-forward, non-recurrent neural network with an input layer, an output layer, and 1 or more hidden layers in between

    目标:利用前馈非递归神经网络,在其中包含一个输入层,一个输出层以及一个或多个隐藏层
  • Step 1: Input layer = encoder

    步骤1:输入层=编码器
  • ……….Maps high dimensional input to lower dimensional image of a latent representation using activation functions, such as sigmoid or ReLU

    ………。使用S型或ReLU等激活函数将高维输入映射到潜在表示的低维图像
  • Step 2: Output layer = decoder

    步骤2:输出层=解码器
  • ……….Maps latent representation back to original high dimension

    ………。将潜在表示映射回原始高维
  • Trained through backpropagation of the error

    通过错误的反向传播进行训练
  • Related methods: sparse autoencoder, denoising autoencoder, contractive autoencoder, variational autoencoder

    相关方法:稀疏自动编码器,去噪自动编码器,压缩自动编码器,变分自动编码器

等值线图(2000) [10] (IsoMap (2000) [10])

  • GOAL: Determine underlying manifold

    目标:确定基础流形
  • Step 1: Construct graph of k nearest neighbors

    步骤1:构造k个最近邻居的图
  • Step 2: Compute geodesic distance estimates (sum of edge weights along shortest path between two nodes)

    步骤2:计算测地距离估算值(沿两个节点之间的最短路径的边缘权重之和)
  • Step 3: Use multidimensional scaling to produce mapping to lower dimension, d-dimensional embedding

    第3步:使用多维缩放来生成到低维,d维嵌入的映射

LLE (2000) [11] (LLE (2000) [11])

  • GOAL: Compute low dimensional, neighborhood preserving embedding of high dimensional input

    目标:计算高维输入的低维,邻域保留嵌入
  • Step 1: Compute distance vector between nearest neighbors

    步骤1:计算最近邻居之间的距离向量
  • Step 2: Compute the weights that linearly reconstruct fitted distance vectors from neighbors, minimizing constrained least-squares residuals

    步骤2:计算权重,以线性方式重建与邻居的拟合距离向量,从而将约束最小二乘残差最小化
  • Step 3: Minimize embedding cost function by finding smallest eigenmodes of sparse symmetric matrices to ultimately compute low-dimensional embedding vectors

    步骤3:通过找到稀疏对称矩阵的最小本征模来最小化嵌入成本函数,以最终计算低维嵌入向量
  • Computes each point as a linear combination of its neighbors

    将每个点计算为其相邻点的线性组合
  • Global information used, not local information

    使用了全球信息,而不是本地信息

拉普拉斯特征地图(2003) [12] (LaPlacian Eigenmaps (2003) [12])

  • GOAL: Reduce number of dimensions while preserving local information

    目标:在保留当地信息的同时减少尺寸
  • Discrete approximation to a continuous map that naturally arises from the geometry of the underlying manifold

    离散映射近似为连续图,这自然是由下层歧管的几何形状引起的
  • Step 1: Construct adjacency matrix (either by selecting neighborhoods of radius 𝜖 or selecting k nearest neighbors)

    步骤1:构造邻接矩阵(通过选择半径为𝜖的邻居或选择k个最近的邻居)
  • Step 2: Choose weights for edges (either by heat kernel or simple minded)

    第2步:为边缘选择权重(通过热核或简单头脑)
  • Step 3: Construct graph. Assume connectivity. Compute eigenvalues and vectors for generalized eigenvector problem using the Laplacian matrix. Project into lower dimensional space.

    步骤3:建立图表。 假设连接。 使用拉普拉斯矩阵为广义特征向量问题计算特征值和向量。 投影到较低维度的空间中。

t-SNE (2008) [13] (t-SNE (2008) [13])

  • GOAL: Visualize high-dimensional data comprised of several different, related low-dimensional manifolds

    目标:可视化由几个不同的相关低维流形组成的高维数据
  • Step 1: Compute Stochastic Neighbor Embedding (SNE): Gaussian probability distribution of pairwise input data points’ Euclidean distance in high dimension

    步骤1:计算随机邻居嵌入(SNE):成对输入数据点的高维欧式距离的高斯概率分布
  • ……….Normalize

    ………。归一化
  • ……….Symmetrize, re-normalize

    ………。对称,重新归一化
  • Step 2: Compute Student t probability distribution of pairwise output data points’ Euclidean distance in low dimensional map generated by regular gradient descent

    步骤2:在规则梯度下降生成的低维图中计算成对输出数据点的欧氏距离的学生t概率分布
  • ……….Normalize

    ………。归一化
  • Step 3: Minimize Kullback-Leibler divergence cost function between the two distributions

    步骤3:最小化两个分布之间的Kullback-Leibler差异成本函数
  • Related methods: Barnes-Hut t-SNE, LargeVis

    相关方法:Barnes-Hut t-SNE,LargeVis

UMAP (2018) [14] (UMAP (2018) [14])

  • GOAL: Determine underlying manifold

    目标:确定基础流形
  • More global structure incorporation than t-SNE

    全球结构整合比t-SNE更多
  • No restriction on embedding dimension

    嵌入尺寸无限制
  • Step 1: Compute Stochastic Neighbor Embedding (SNE): any exponential probability distribution of pairwise input data points’ smoothed distance (any type of distance) in high dimension

    步骤1:计算随机邻居嵌入(SNE):成对输入数据点的高维平滑距离(任何类型的距离)的任何指数概率分布
  • ……….Constructs undirected weighted k-neighbor graph

    ………。构造无向加权k邻居图
  • ……….Perplexity-based normalization

    ………。基于困惑的归一化
  • ……….Symmetrize as fuzzy set union, no normalization conducted after symmetrization, reducing computational demand

    ………。对称作为模糊集并集,对称化后不进行归一化,减少了计算需求
  • Step 2: Compute exponential probability distribution of pairwise output data points’ distance in low dimensional map generated by stochastic gradient descent

    步骤2:计算随机梯度下降所生成的低维图中成对输出数据点距离的指数概率分布
  • Step 3: Minimize cross-entropy cost function between two distributions

    步骤3:最小化两个分布之间的交叉熵成本函数
  • Related methods: LargeVis

    相关方法:LargeVis

维度的祝福 (Blessing of Dimensionality)

While searching in n-dimensional space may be cursed, the silver lining of the phenomenon was highlighted by David Donoho in his 2000 address to the American Mathematical Society.[17] He discussed high dimensionality issues in non-parametric estimation as well as model selection methods. In contrast, he then cited examples of problems actually improved by high dimensions, particularly for simplified asymptotic derivation and continuum theory.

尽管在n维空间中进行搜索可能会受到诅咒,但大卫·多诺霍(David Donoho)在2000年致美国数学学会的演讲中强调了这一现象的一线希望。 [17]他讨论了非参数估计中的高维问题以及模型选择方法。 相比之下,他随后列举了高维度实际上改善的问题的示例,特别是对于简化的渐近推导和连续论。

Perhaps the curse is not a curse at all and just another thing. :)

也许诅咒根本不是诅咒,而只是另一回事。 :)

Image for post

[1] Bellman R. Dynamic Programming. Cameron Station Alexandria, VA: United States Air Force Project RAND; 1956. https://apps.dtic.mil/dtic/tr/fulltext/u2/144264.pdf. Accessed June 30, 2020.

[1] Bellman R.动态编程。 弗吉尼亚州亚历山大市卡梅伦站:美国空军兰德项目; 1956年。https://apps.dtic.mil/dtic/tr/fulltext/u2/144264.pdf。 于2020年6月30日访问。

[2] O’Connor JJ, Robertson EF. Richard Earnest Bellman. Mathematics Biographies. https://mathshistory.st-andrews.ac.uk/Biographies/Bellman/. Published 2020. Accessed June 30, 2020.

[2] O'Connor JJ,罗伯逊EF。 理查德·厄内斯特·贝尔曼(Richard Earnest Bellman)。 数学传记。 https://mathshistory.st-andrews.ac.uk/Biographies/Bellman/。 于2020年发布。于2020年6月30日访问。

[3] Rust J. Using Randomization to Break the Curse of Dimensionality. Econometrica. 1997;65(3):487–516. doi:10.2307/2171751

[3] Rust J.使用随机化打破维数的诅咒。 计量经济学。 1997; 65(3):487-516。 doi:10.2307 / 2171751

[4] Pearson, Karl. “Principal components analysis.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 6.2 (1901): 559.

[4]皮尔森,卡尔。 “主成分分析”。 伦敦,爱丁堡和都柏林哲学杂志和科学杂志6.2(1901):559。

[5] Spearman, C. “DETERMINED AND MEASURED.” The American Journal of Psychology 15 (1904): 201.‌

[5] Spearman,C.“确定和测量”。 美国心理学杂志15(1904):201。

[6] Rao, C. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification. Journal of the Royal Statistical Society. Series B (Methodological), 10(2), 159–203. Retrieved July 5, 2020, from www.jstor.org/stable/2983775

[6] Rao,C.(1948)。 在生物学分类问题中多重测量的利用。 皇家统计学会杂志。 系列B(方法论), 10 (2),159–203。 于2020年7月5日从www.jstor.org/stable/2983775检索

[7] Jutten C, Herault J. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing. 1991;24(1):1–10. doi:10.1016/0165–1684(91)90079-x

[7] Jutten C,Herault J.盲目分离源,第一部分:基于神经模拟架构的自适应算法。 信号处理。 1991; 24(1):1-10。 doi:10.1016 / 0165–1684(91)90079-x

[8] Ballard D. Modular Learning in Neural Networks. In: AAAI. ; 1987. https://www.aaai.org/Papers/AAAI/1987/AAAI87-050.pdf. Accessed July 5, 2020.

[8] Ballard D.神经网络中的模块化学习。 在:AAAI。 ; 1987年。https://www.aaai.org/Papers/AAAI/1987/AAAI87-050.pdf。 于2020年7月5日访问。

‌[9] Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015;61:85–117. doi:10.1016/j.neunet.2014.09.003

‌ [9] Schmidhuber J.神经网络中的深度学习:概述。 神经网络。 2015; 61:85–117。 doi:10.1016 / j.neunet.2014.09.003

‌[10] Tenenbaum JB. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science. 2000;290(5500):2319–2323. doi:10.1126/science.290.5500.2319

‌ [10] Tenenbaum JB。 非线性降维的全局几何框架。 科学。 2000; 290(5500):2319–2323。 doi:10.1126 / science.290.5500.2319

‌[11] Roweis ST. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science. 2000;290(5500):2323–2326. doi:10.1126/science.290.5500.2323

‌ [11] Roweis ST。 通过局部线性嵌入减少非线性维数。 科学。 2000; 290(5500):2323-2326。 doi:10.1126 / science.290.5500.2323

‌[12] Belkin M, Niyogi P. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation. 2003;15(6):1373–1396. doi:10.1162/089976603321780317

‌ [12] Belkin M,Niyogi P.用于降维和数据表示的Laplacian特征图。 神经计算。 2003; 15(6):1373-1396。 doi:10.1162 / 089976603321780317

‌[13] van der Maaten L, Hinton G. Visualizing Data using t-SNE. Journal of Machine Learning Research. 2008;9:2579–2605. http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. Accessed January 8, 2020.

‌ [13] van der Maaten L,HintonG。使用t-SNE可视化数据。 机器学习研究杂志。 2008; 9:2579-2605。 http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf。 于2020年1月8日访问。

‌[14] McInnes L, Healy J, Saul N, Großberger L. UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software. 2018;3(29):861. doi:10.21105/joss.00861

‌ [14] McInnes L,Healy J,Saul N,GroßbergerL. UMAP:统一流形逼近和投影。 开源软件杂志。 2018; 3(29):861。 doi:10.21105 / joss.00861

[15] Carpenter B. Typical Sets and the Curse of Dimensionality. mc-stan.org. https://mc-stan.org/users/documentation/case-studies/curse-dims.html. Published April 11, 2017. Accessed June 30, 2020.

[15]木匠B。典型集合和维数的诅咒。 mc-stan.org。 https://mc-stan.org/users/documentation/case-studies/curse-dims.html。 2017年4月11日发布。于2020年6月30日访问。

[16] Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When Is Nearest Neighbor Meaningful?; 1998. https://minds.wisconsin.edu/handle/1793/60174. Accessed June 30, 2020.

[16] Beyer K,Goldstein J,Ramakrishnan R和ShaftU。《最近的邻居何时有意义? 1998年。https://minds.wisconsin.edu/handle/1793/60174。 于2020年6月30日访问。

[17] 7.Donoho D. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality. In: ; 2000. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.3392. Accessed June 30, 2020.

[17] 7.Donoho D.高维数据分析:维数的诅咒和祝福。 在:; 2000。https //citeseerx.ist.psu.edu/viewdoc/summary doi = 10.1.1.329.3392。 于2020年6月30日访问。

翻译自: https://medium.com/swlh/series-on-theories-high-dimensional-data-101-81cab8e0bea6

游戏行业数据类丛书

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值