Calculation of Vector Similarity

最新推荐文章于 2024-03-31 10:01:16 发布

acesysu

最新推荐文章于 2024-03-31 10:01:16 发布

阅读量568

点赞数

分类专栏：集体智慧编程文章标签： machine learning 向量距离

本文链接：https://blog.csdn.net/killace/article/details/44920759

版权

集体智慧编程专栏收录该内容

1 篇文章 0 订阅

订阅专栏

It is a quite fundamental technique in machine learning and other fields to calculate the similarity of two vectors.

Given two vectors of n dimensions, as ：

1. Euclidean Distance. (as Ed, Most frequently seen)

In fact, it's just straight-line distance of the two sample points (vectors) in a multi-dimensional space.

We have:

The formula above tells us how similar two vectors are by the value of Ed1, the smaller, the more similar. When Ed1 equals 0, we deem they are identical vectors. Also there is another alternative:

This seems more reasonable since a bigger value of Ed2 means greater similarity. And notice that Ed2 is within interval (0,1].

2. Pearson Correlation. (as PC)

PC is slightly more sophisticated than Ed. A pearson correlation coefficient (r) is generated to measure how well two sets of data fit on a straight line.

Formula to calculate (r) goes like what follows:

r is distributed between [-1,1], the bigger | r | (absolute) is, the more they are related. A positive r means they are positively correlated. And a zero value of r means they are not related. Pearson correlation approach works even when vector dimensions are not quite well normalized.