jieba分词textrank算法

2021SC@SDUSC

add_edgs如果文档中存在一个词指向另一个词的边,则为这条边增加一个weight。
rank:为ws和outsum各创建一个字典,遍历文档中的词图,然后ws排序得到单个词并按照词频排序,textrank方法从而可以得到词频

class UndirectWeightedGraph:
d = 0.85

def __init__(self):
    self.graph = defaultdict(list)

def addEdge(self, start, end, weight):
    # use a tuple (start, end, weight) instead of a Edge object
    self.graph[start].append((start, end, weight))
    self.graph[end].append((end, start, weight))

def rank(self):
    ws = defaultdict(float)
    outSum = defaultdict(float)

    wsdef = 1.0 / (len(self.graph) or 1.0)
    for n, out in self.graph.items():
        ws[n] = wsdef
        outSum[n] = sum((e[2] for e in out), 0.0)

    # this line for build stable iteration
    sorted_keys = sorted(self.graph.keys())
    for x in xrange(10):  # 10 iters
        for n in sorted_keys:
            s = 0
            for e in self.graph[n]:
                s += e[2] / outSum[e[1]] * ws[e[1]]
            ws[n] = (1 - self.d) + self.d * s

    (min_rank, max_rank) = (sys.float_info[0], sys.float_info[3])

    for w in itervalues(ws):
        if w < min_rank:
            min_rank = w
        if w > max_rank:
            max_rank = w

    for n, w in ws.items():
        # to unify the weights, don't *100.
        ws[n] = (w - min_rank / 10.0) / (max_rank - min_rank / 10.0)

    return ws
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值