scikit-learn：4.5. Random Projection

最新推荐文章于 2025-09-22 16:49:09 发布

原创

最新推荐文章于 2025-09-22 16:49:09 发布 · 3.8k 阅读

2 ·

CC 4.0 BY-SA版权

scikit-learn的random_projection模块提供了一种通过牺牲一定的精度来提高效率的降维方法，主要实现Gaussian和sparse随机矩阵。基于Johnson-Lindenstrauss引理，该方法能在保持点间距离大致不变的情况下将高维数据映射到低维空间。GaussianRandomProjection使用高斯分布生成随机矩阵，而SparseRandomProjection则使用稀疏矩阵，内存消耗和计算速度更优。

参考：http://scikit-learn.org/stable/modules/random_projection.html

The sklearn.random_projection module 通过trading accuracy（可控的范围）来降维数据，提高效率。实现了两类unstructured random matrix：: Gaussian random matrix and sparse random matrix.

理论基础：the Johnson-Lindenstrauss lemma (quoting Wikipedia)，该引理大概内容为：

In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings(低失真嵌入) of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection(正交投影).

the sklearn.random_projection.johnson_lindenstrauss_min_dim 可以仅通过样本的数量来得到随机子空间的保守最小维度（同时保证向低维空间随机投影时造成的失真是bounded的，estimates conservatively the minimal size of the ran