https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.spatial.distance.cdist.html
scipy.spatial.distance.cdist(XA, XB, metric='euclidean', p=None, V=None, VI=None, w=None)
Computes distance between each pair of the two collections of inputs.
See Notes for common calling conventions.
Parameters: | XA : ndarray
XB : ndarray
metric : str or callable, optional
p : double, optional
w : ndarray, optional
V : ndarray, optional
VI : ndarray, optional
|
---|---|
Returns: | Y : ndarray
|
Raises: | ValueError
|
Notes
The following are common calling conventions:
-
Y = cdist(XA, XB, 'euclidean')
Computes the distance between mm points using Euclidean distance (2-norm) as the distance metric between the points. The points are arranged as mm nn -dimensional row vectors in the matrix X.
-
Y = cdist(XA, XB, 'minkowski', p)
Computes the distances using the Minkowski distance ||u−v||p||u−v||p (pp -norm) where p≥1p≥1 .
-
Y = cdist(XA, XB, 'cityblock')
Computes the city block or Manhattan distance between the points.
-
Y = cdist(XA, XB, 'seuclidean', V=None)
Computes the standardized Euclidean distance. The standardized Euclidean distance between two n-vectors u and v is
∑(ui−vi)2/V[xi]−−−−−−−−−−−−−−−√.∑(ui−vi)2/V[xi].
V is the variance vector; V[i] is the variance computed over all the i’th components of the points. If not passed, it is automatically computed.
-
Y = cdist(XA, XB, 'sqeuclidean')
Computes the squared Euclidean distance ||u−v||22||u−v||22 between the vectors.
-
Y = cdist(XA, XB, 'cosine')
Computes the cosine distance between vectors u and v,
1−u⋅v||u||2||v||21−u⋅v||u||2||v||2
where ||∗||2||∗||2 is the 2-norm of its argument *, and u⋅vu⋅v is the dot product of uu and vv .
-
Y = cdist(XA, XB, 'correlation')
Computes the correlation distance between vectors u and v. This is
1−(u−u¯)⋅(v−v¯)||(u−u¯)||2||(v−v¯)||21−(u−u¯)⋅(v−v¯)||(u−u¯)||2||(v−v¯)||2
where v¯v¯ is the mean of the elements of vector v, and x⋅yx⋅y is the dot product of xx and yy .
-
Y = cdist(XA, XB, 'hamming')
Computes the normalized Hamming distance, or the proportion of those vector elements between two n-vectors u and v which disagree. To save memory, the matrix X can be of type boolean.
-
Y = cdist(XA, XB, 'jaccard')
Computes the Jaccard distance between the points. Given two vectors, u and v, the Jaccard distance is the proportion of those elements u[i] and v[i] that disagree where at least one of them is non-zero.
-
Y = cdist(XA, XB, 'chebyshev')
Computes the Chebyshev distance between the points. The Chebyshev distance between two n-vectors u and v is the maximum norm-1 distance between their respective elements. More precisely, the distance is given by
d(u,v)=maxi|ui−vi|.d(u,v)=maxi|ui−vi|.
- Y = cdist(XA, XB, 'canberra')
Computes the Canberra distance between the points. The Canberra distance between two points u and v is
d(u,v)=∑i|ui−vi||ui|+|vi|.d(u,v)=∑i|ui−vi||ui|+|vi|.
- Y = cdist(XA, XB, 'braycurtis')
Computes the Bray-Curtis distance between the points. The Bray-Curtis distance between two points u and v is
d(u,v)=∑i(|ui−vi|)∑i(|ui+vi|)d(u,v)=∑i(|ui−vi|)∑i(|ui+vi|)
- Y = cdist(XA, XB, 'mahalanobis', VI=None)
Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points u and v is (u−v)(1/V)(u−v)T−−−−−−−−−−−−−−−−−√(u−v)(1/V)(u−v)T where (1/V)(1/V) (the VI variable) is the inverse covariance. If VI is not None, VI will be used as the inverse covariance matrix.
- Y = cdist(XA, XB, 'yule')
Computes the Yule distance between the boolean vectors. (see yule function documentation)
- Y = cdist(XA, XB, 'matching')
Synonym for ‘hamming’.
- Y = cdist(XA, XB, 'dice')
Computes the Dice distance between the boolean vectors. (see dice function documentation)
- Y = cdist(XA, XB, 'kulsinski')
Computes the Kulsinski distance between the boolean vectors. (see kulsinski function documentation)
- Y = cdist(XA, XB, 'rogerstanimoto')
Computes the Rogers-Tanimoto distance between the boolean vectors. (see rogerstanimoto function documentation)
- Y = cdist(XA, XB, 'russellrao')
Computes the Russell-Rao distance between the boolean vectors. (see russellrao function documentation)
- Y = cdist(XA, XB, 'sokalmichener')
Computes the Sokal-Michener distance between the boolean vectors. (see sokalmichener function documentation)
- Y = cdist(XA, XB, 'sokalsneath')
Computes the Sokal-Sneath distance between the vectors. (see sokalsneath function documentation)
- Y = cdist(XA, XB, 'wminkowski')
Computes the weighted Minkowski distance between the vectors. (see wminkowski function documentation)
- Y = cdist(XA, XB, f)
Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. For example, Euclidean distance between the vectors could be computed as follows:
dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,:
dm = cdist(XA, XB, sokalsneath)would calculate the pair-wise distances between the vectors in X using the Python function sokalsneath. This would result in sokalsneath being called (n2)(n2) times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax:
dm = cdist(XA, XB, 'sokalsneath')
Examples
Find the Euclidean distances between four 2-D coordinates:
>>>
>>> from scipy.spatial import distance >>> coords = [(35.0456, -85.2672), ... (35.1174, -89.9711), ... (35.9728, -83.9422), ... (36.1667, -86.7833)] >>> distance.cdist(coords, coords, 'euclidean') array([[ 0. , 4.7044, 1.6172, 1.8856], [ 4.7044, 0. , 6.0893, 3.3561], [ 1.6172, 6.0893, 0. , 2.8477], [ 1.8856, 3.3561, 2.8477, 0. ]])
Find the Manhattan distance from a 3-D point to the corners of the unit cube:
>>>
>>> a = np.array([[0, 0, 0], ... [0, 0, 1], ... [0, 1, 0], ... [0, 1, 1], ... [1, 0, 0], ... [1, 0, 1], ... [1, 1, 0], ... [1, 1, 1]]) >>> b = np.array([[ 0.1, 0.2, 0.4]]) >>> distance.cdist(a, b, 'cityblock') array([[ 0.7], [ 0.9], [ 1.3], [ 1.5], [ 1.5], [ 1.7], [ 2.1], [ 2.3]])