python 人脸比对余弦_使用余弦相似度将列表与pandas中的行进行比较,然后运行...

考虑到上述两个数据帧df

Department Country Age Grade Score

0 Math India Young A 97

1 Math India Young B 86

2 Math India Young D 68

3 Science India Young A 92

4 Science India Young B 81

5 Science India Young C 76

6 Social India Young B 88

7 Social India Young D 62

8 Social India Young C 72

input

Country Age Grade Score

0 India Young B 84

1 India Young D 65

2 India Young A 98

可能的解决办法之一是

^{pr2}$

使用scikit-learn包将分类特征转换为数值df['Country'] = le.fit_transform(df['Country'])

df['Age'] = le.fit_transform(df['Age'])

df['Grade'] = le.fit_transform(df['Grade'])

df

输出:Department Country Age Grade Score

0 Math 0 0 0 97

1 Math 0 0 1 86

2 Math 0 0 3 68

3 Science 0 0 0 92

4 Science 0 0 1 81

5 Science 0 0 2 76

6 Social 0 0 1 88

7 Social 0 0 3 62

8 Social 0 0 2 72

input['Country'] = le.fit_transform(input['Country'])

input['Age'] = le.fit_transform(input['Age'])

input['Grade'] = le.fit_transform(input['Grade'])

input

输出:Country Age Grade Score

0 0 0 1 84

1 0 0 2 65

2 0 0 0 98

定义一个cosine-similarity函数def cosine_similarity(a, b):

nom = np.sum(np.multiply(a, b))

denom = np.sqrt(np.sum(np.square(a))) * np.sqrt(np.sum(np.square(b)))

sim = nom / denom

return sim

dept = list(df['Department'].values)

dept = list(OrderedDict.fromkeys(dept).keys())

results = []

for i in range(len(input)):

similarity = []

for j in range(len(df)):

a = input.iloc[i]

b = df.iloc[j, 1:]

c_sim = cosine_similarity(a, b)

similarity.append(c_sim)

max_similarity = []

for k in range(0, len(df), 3):

max_3 = max(similarity[k:k+3])

max_similarity.append(max_3)

max_idx = max_similarity.index(max(max_similarity))

results.append(dept[max_idx])

results

输出:['Math', 'Social', 'Math']

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值