每日一题-新词挖掘

雨山小七

已于 2023-02-23 19:41:28 修改

阅读量121

点赞数

文章标签： python

于 2023-02-23 19:35:47 首次发布

本文链接：https://blog.csdn.net/xuedong_1989/article/details/129188233

版权

题目描述

小华负责公司知识图谱产品，现在要通过新词挖掘完善知识图谱。
新词挖掘：给出一个待挖掘问题内容字符串content 和一个词的字符串word，找到content中所有word的新词

新词：使用词word的字符排列形成的字符串。
请帮小华实现新词挖掘，返回发现的新词的数量。

输入描述

第一行输入为待挖掘的文本内容content；
第二行输入词为word；

输出描述

在content中找到的所有word的新词的数量。

用例1：
输入：
qweebaewqd
qwe

输出：
2
说明
起始索引等于0的子串是“qwe”，它是word的新词。
起始索引等于6的子串是“ewq”，它是word的新词

用例2：
输入：
abab
ab
输出：
3
说明：
起始索引等于0的子串是“ab”，它是word的新词。
起始索引等于1的子串是“ba”，它是word的新词。
起始索引等于2的子串是“ab”，它是word的新词。

解析：

方法一

直接截取定长的子串进行排序后与word排序后对比
代码

def get_result(content,word):
    m = len(content)
    n= len(word)
    if m < n:
        return 0
    word_list = list(word)
    word_list.sort()
    res=0
    for l in range(m):
    	temp=content[l:l+n]
        temp1 = list(temp)
        temp1.sort()
        if temp1 == word_list:
        	res+=1
        else:
            pass
    return res

方法二

不在进行排序，截取定长子串，统计各字母多少，与word各字母个数相同则为新词。

def get_result(content,word):
    if len(content) < len(word):
        return 0
    ans = 0

    count = {}
    m = len(content)
    n = len(word)
    l = 0

    for c in word:
        if c not in count:
            count[c] = 1
        else:
            count[c] +=1
    count1={}
    for k,v in count.items():
        count1[k] = v
    while l  <= m-n:
        temp = content[l:l+n]
        mark=0
        count1 = {}
        for k, v in count.items():
            count1[k] = v
        for c in temp:
            if c not in count1:
                break
            else:
                count1[c] -=1
        for c in count1:
            if count1[c] != 0:
                mark=1
                break
            else:
                pass
        if mark == 0:
            ans+=1

        l +=1
    return ans