如何计算多个向量的Tanimoto系数

最新推荐文章于 2024-08-14 12:27:42 发布

qq^^614136809

最新推荐文章于 2024-08-14 12:27:42 发布

阅读量387

点赞数 3

文章标签： django

本文链接：https://blog.csdn.net/D0126_/article/details/138123717

版权

目标：计算数据库中多个成分向量的Tanimoto系数，用于比较成分间的相似性。

挑战：直接使用SQL查询计算Tanimoto系数效率较低，随着成分数量增加，查询次数呈指数增长。

解决方案
- 使用Python将所有成分和食谱数据加载到内存中，并在内存中计算Tanimoto系数。
- 使用预计算来提高计算速度。
- 使用字典来存储计算结果，便于快速查找和访问。

代码示例

import sqlite3
import numpy as np

class Filtering:
    def __init__(self):
        self._connection = sqlite3.connect('database.db')
        self._counts = None
        self._intersections = {}

    def inc_intersections(self, ingredients):
        ingredients.sort()
        length = len(ingredients)
        for i in range(1, length):
            a = ingredients[i]
            for j in range(0, i):
                b = ingredients[j]
                if a not in self._intersections:
                    self._intersections[a] = {b: 1}
                elif b not in self._intersections[a]:
                    self._intersections[a][b] = 1
                else:
                    self._intersections[a][b] += 1


    def precompute_tanimoto(self):
        counts = {}
        self._intersections = {}

        cursor = self._connection.cursor()
        cursor.execute('''select recipe_id, ingredient_id
            from recipe_ingredient
            order by recipe_id, ingredient_id''')
        rows = cursor.fetchall()

        print(len(rows))

        last_recipe = None
        for recipe, ingredient in rows:
            if recipe != last_recipe:
                if last_recipe != None:
                    self.inc_intersections(ingredients)
                last_recipe = recipe
                ingredients = [ingredient]
            else:
                ingredients.append(ingredient)

            if ingredient not in counts:
                counts[ingredient] = 1
            else:
                counts[ingredient] += 1

        self.inc_intersections(ingredients)

        self._counts = counts

    def tanimoto(self, ingredient_a, ingredient_b):
        if self._counts == None:
            self.precompute_tanimoto()

        if ingredient_b > ingredient_a:
            ingredient_b, ingredient_a = ingredient_a, ingredient_b

        n_a, n_b = self._counts[ingredient_a], self._counts[ingredient_b]
        n_ab = self._intersections[ingredient_a][ingredient_b]

        print(n_a, n_b, n_ab)

        return float(n_ab) / (n_a + n_b - n_ab)

# 加载数据
filtering = Filtering()
filtering.precompute_tanimoto()

# 计算Tanimoto系数
ingredient_a = 1
ingredient_b = 2
tanimoto_coefficient = filtering.tanimoto(ingredient_a, ingredient_b)
print(tanimoto_coefficient)