scipy.spatial.distance.cdist

最新推荐文章于 2023-12-09 16:58:05 发布

DRACO于

最新推荐文章于 2023-12-09 16:58:05 发布

阅读量1.6k

点赞数 1

分类专栏： tensorflow Python学习行为检索

Python学习同时被 3 个专栏收录

54 篇文章 2 订阅

订阅专栏

tensorflow

40 篇文章 1 订阅

订阅专栏

行为检索

13 篇文章 0 订阅

订阅专栏

https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.spatial.distance.cdist.html

scipy.spatial.distance.cdist(XA, XB, metric='euclidean', p=None, V=None, VI=None, w=None)

Computes distance between each pair of the two collections of inputs.

See Notes for common calling conventions.

Parameters:	XA : ndarray An mAmA by nn array of mAmA original observations in an nn -dimensional space. Inputs are converted to float type. XB : ndarray An mBmB by nn array of mBmB original observations in an nn -dimensional space. Inputs are converted to float type. metric : str or callable, optional The distance metric to use. If a string, the distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’. p : double, optional The p-norm to apply Only for Minkowski, weighted and unweighted. Default: 2. w : ndarray, optional The weight vector. Only for weighted Minkowski. Mandatory V : ndarray, optional The variance vector Only for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1) VI : ndarray, optional The inverse of the covariance matrix Only for Mahalanobis. Default: inv(cov(vstack([XA, XB]).T)).T
Returns:	Y : ndarray A mAmA by mBmB distance matrix is returned. For each ii and jj , the metric dist(u=XA[i], v=XB[j]) is computed and stored in the ijij th entry.
Raises:	ValueError An exception is thrown if XA and XB do not have the same number of columns.

Parameters:

XA : ndarray

An mAmA by nn array of mAmA original observations in an nn -dimensional space. Inputs are converted to float type.

XB : ndarray

An mBmB by nn array of mBmB original observations in an nn -dimensional space. Inputs are converted to float type.

metric : str or callable, optional

The distance metric to use. If a string, the distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.

p : double, optional

The p-norm to apply Only for Minkowski, weighted and unweighted. Default: 2.

w : ndarray, optional

The weight vector. Only for weighted Minkowski. Mandatory

V : ndarray, optional

The variance vector Only for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1)

VI : ndarray, optional

The inverse of the covariance matrix Only for Mahalanobis. Default: inv(cov(vstack([XA, XB]).T)).T

Returns:

Y : ndarray

A mAmA by mBmB distance matrix is returned. For each ii and jj , the metric dist(u=XA[i], v=XB[j]) is computed and stored in the ijij th entry.

Raises:

ValueError

An exception is thrown if XA and XB do not have the same number of columns.

Notes

The following are common calling conventions:

Y = cdist(XA, XB, 'euclidean')

Computes the distance between mm points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as mm nn -dimensional row vectors in the matrix X.
Y = cdist(XA, XB, 'minkowski', p)

Computes the distances using the Minkowski distance ||u−v||p||u−v||p (pp -norm) where p≥1p≥1 .
Y = cdist(XA, XB, 'cityblock')

Computes the city block or Manhattan distance between the points.
Y = cdist(XA, XB, 'seuclidean', V=None)

Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors u and v is

∑(ui−vi)2/V[xi]−−−−−−−−−−−−−−−√.∑(ui−vi)2/V[xi].

V is the variance vector; V[i] is the variance computed over all the i’th components of the points. If not passed, it is automatically computed.
Y = cdist(XA, XB, 'sqeuclidean')

Computes the squared Euclidean distance ||u−v||22||u−v||22 between the vectors.
Y = cdist(XA, XB, 'cosine')

Computes the cosine distance between vectors u and v,

1−u⋅v||u||2||v||21−u⋅v||u||2||v||2

where ||∗||2||∗||2 is the 2-norm of its argument *, and u⋅vu⋅v is the dot product of uu and vv .
Y = cdist(XA, XB, 'correlation')

Computes the correlation distance between vectors u and v. This is

1−(u−u¯)⋅(v−v¯)||(u−u¯)||2||(v−v¯)||21−(u−u¯)⋅(v−v¯)||(u−u¯)||2||(v−v¯)||2

where v¯v¯ is the mean of the elements of vector v, and x⋅yx⋅y is the dot product of xx and yy .
Y = cdist(XA, XB, 'hamming')

Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors u and v which disagree. To save memory, the matrix X can be of type boolean.
Y = cdist(XA, XB, 'jaccard')

Computes the Jaccard distance between the points. Given two vectors, u and v, the Jaccard distance is the proportion of those elements u[i] and v[i] that disagree where at least one of them is non-zero.
Y = cdist(XA, XB, 'chebyshev')

Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors u and v is the maximum norm-1 distance between their respective elements. More precisely, the distance is given by

d(u,v)=maxi|ui−vi|.d(u,v)=maxi|ui−vi|.

Y = cdist(XA, XB, 'canberra')

Computes the Canberra distance between the points. The Canberra distance between two points u and v is

d(u,v)=∑i|ui−vi||ui|+|vi|.d(u,v)=∑i|ui−vi||ui|+|vi|.

Y = cdist(XA, XB, 'braycurtis')

Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points u and v is

d(u,v)=∑i(|ui−vi|)∑i(|ui+vi|)d(u,v)=∑i(|ui−vi|)∑i(|ui+vi|)

Y = cdist(XA, XB, 'mahalanobis', VI=None)

Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points u and v is (u−v)(1/V)(u−v)T−−−−−−−−−−−−−−−−−√(u−v)(1/V)(u−v)T where (1/V)(1/V) (the VI variable) is the inverse covariance. If VI is not None, VI will be used as the inverse covariance matrix.

Y = cdist(XA, XB, 'yule')

Computes the Yule distance between the boolean vectors. (see yule function documentation)

Y = cdist(XA, XB, 'matching')

Synonym for ‘hamming’.

Y = cdist(XA, XB, 'dice')

Computes the Dice distance between the boolean vectors. (see dice function documentation)

Y = cdist(XA, XB, 'kulsinski')

Computes the Kulsinski distance between the boolean vectors. (see kulsinski function documentation)

Y = cdist(XA, XB, 'rogerstanimoto')

Computes the Rogers-Tanimoto distance between the boolean vectors. (see rogerstanimoto function documentation)

Y = cdist(XA, XB, 'russellrao')

Computes the Russell-Rao distance between the boolean vectors. (see russellrao function documentation)

Y = cdist(XA, XB, 'sokalmichener')

Computes the Sokal-Michener distance between the boolean vectors. (see sokalmichener function documentation)

Y = cdist(XA, XB, 'sokalsneath')

Computes the Sokal-Sneath distance between the vectors. (see sokalsneath function documentation)

Y = cdist(XA, XB, 'wminkowski')

Computes the weighted Minkowski distance between the vectors. (see wminkowski function documentation)

Y = cdist(XA, XB, f)

Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows:
dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))
Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,:
dm = cdist(XA, XB, sokalsneath)
would calculate the pair-wise distances between the vectors in X using the Python function sokalsneath. This would result in sokalsneath being called (n2)(n2) times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax:
dm = cdist(XA, XB, 'sokalsneath')

Examples

Find the Euclidean distances between four 2-D coordinates:

>>>

>>> from scipy.spatial import distance
>>> coords = [(35.0456, -85.2672),
...           (35.1174, -89.9711),
...           (35.9728, -83.9422),
...           (36.1667, -86.7833)]
>>> distance.cdist(coords, coords, 'euclidean')
array([[ 0.    ,  4.7044,  1.6172,  1.8856],
       [ 4.7044,  0.    ,  6.0893,  3.3561],
       [ 1.6172,  6.0893,  0.    ,  2.8477],
       [ 1.8856,  3.3561,  2.8477,  0.    ]])

Find the Manhattan distance from a 3-D point to the corners of the unit cube:

>>>

>>> a = np.array([[0, 0, 0],
...               [0, 0, 1],
...               [0, 1, 0],
...               [0, 1, 1],
...               [1, 0, 0],
...               [1, 0, 1],
...               [1, 1, 0],
...               [1, 1, 1]])
>>> b = np.array([[ 0.1,  0.2,  0.4]])
>>> distance.cdist(a, b, 'cityblock')
array([[ 0.7],
       [ 0.9],
       [ 1.3],
       [ 1.5],
       [ 1.5],
       [ 1.7],
       [ 2.1],
       [ 2.3]])