# 余弦相似度 python_余弦相似度如何测量python中的相似度数学和用法

What is cosine similarity?

Cosine similarity measures the similarity between two vectors by calculating the cosine of the angle between the two vectors.

Cosine similarity is one of the most widely used and powerful similarity measure in Data Science. It is used in multiple applications such as finding similar documents in NLP, information retrieval, finding similar sequence to a DNA in bioinformatics, detecting plagiarism and may more.

Cosine similarity is calculated as follows,

Why cosine of the angle between A and B gives us the similarity?

If you look at the cosine function, it is 1 at theta = 0 and -1 at theta = 180, that means for two overlapping vectors cosine will be the highest and lowest for two exactly opposite vectors. You can consider 1-cosine as distance.

How to calculate it in Python?

The numerator of the formula is the dot product of the two vectors and denominator is the product of L2 norm of both the vectors. Dot product of two vectors is the sum of element wise multiplication of the vectors and L2 norm is the square root of sum of squares of elements of a vector.

We can either use inbuilt functions in Numpy library to calculate dot product and L2 norm of the vectors and put it in the formula or directly use the cosine_similarity from sklearn.metrics.pairwise. Consider two vectors A and B in 2-D, following code calculates the cosine similarity,

import numpy as npimport matplotlib.pyplot as plt# consider two vectors A and B in 2-DA=np.array([7,3])B=np.array([3,7])ax = plt.axes()ax.arrow(0.0, 0.0, A[0], A[1], head_width=0.4, head_length=0.5)plt.annotate(f"A({A[0]},{A[1]})", xy=(A[0], A[1]),xytext=(A[0]+0.5, A[1]))ax.arrow(0.0, 0.0, B[0], B[1], head_width=0.4, head_length=0.5)plt.annotate(f"B({B[0]},{B[1]})", xy=(B[0], B[1]),xytext=(B[0]+0.5, B[1]))plt.xlim(0,10)plt.ylim(0,10)plt.show()plt.close()# cosine similarity between A and Bcos_sim=np.dot(A,B)/(np.linalg.norm(A)*np.linalg.norm(B))print (f"Cosine Similarity between A and B:{cos_sim}")print (f"Cosine Distance between A and B:{1-cos_sim}")
# using sklearn to calculate cosine similarityfrom sklearn.metrics.pairwise import cosine_similarity,cosine_distancescos_sim=cosine_similarity(A.reshape(1,-1),B.reshape(1,-1))print (f"Cosine Similarity between A and B:{cos_sim}")print (f"Cosine Distance between A and B:{1-cos_sim}")
# using scipy, it calculates 1-cosinefrom scipy.spatial import distancedistance.cosine(A.reshape(1,-1),B.reshape(1,-1))

Proof of the formula

Cosine similarity formula can be proved by using Law of cosines,

Consider two vectors A and B in 2-dimensions, such as,

Using Law of cosines,

You can prove the same for 3-dimensions or any dimensions in general. It follows exactly same steps as above.

Summary

We saw how cosine similarity works, how to use it and why does it work. I hope this article helped in understanding the whole concept behind this powerful metric.

• 0
点赞
• 0
评论
• 4
收藏
• 一键三连
• 扫一扫，分享海报

03-26 4764
05-15

04-07
09-23
08-09 5910
12-02 3715
09-04 1万+
05-06 1641
07-15 3万+
12-09 175
05-18 6484