It is a quite fundamental technique in machine learning and other fields to calculate the similarity of two vectors.
Given two vectors of n dimensions, as :
1. Euclidean Distance. (as Ed, Most frequently seen)
In fact, it's just straight-line distance of the two sample points (vectors) in a multi-dimensional space.
We have:
The formula above tells us how similar two vectors are by the value of Ed1, the smaller, the more similar. When Ed1 equals 0, we deem they are identical vectors. Also there is another alternative:
This seems more reasonable since a bigger value of Ed2 means greater similarity. And notice that Ed2 is within interval (0,1].
2. Pearson Correlation. (as PC)
PC is slightly more sophisticated than Ed. A pearson correlation coefficient (r) is generated to measure how well two sets of data fit on a straight line.
Formula to calculate (r) goes like what follows:
r is distributed between [-1,1], the bigger | r | (absolute) is, the more they are related. A positive r means they are positively correlated. And a zero value of r means they are not related. Pearson correlation approach works even when vector dimensions are not quite well normalized.