RDD用法与实例（十二）：实现PageRank

最新推荐文章于 2024-06-07 01:20:31 发布

水母君98

最新推荐文章于 2024-06-07 01:20:31 发布

阅读量270

点赞数

分类专栏：大数据基础文章标签： spark

本文链接：https://blog.csdn.net/m0_37754282/article/details/109291861

版权

import re
from operator import add

def computeContribs(urls, rank):
    # Calculates URL contributions to the rank of other URLs.
    num_urls = len(urls)
    for url in urls:
        yield (url, rank / num_urls)

def parseNeighbors(urls):
    # Parses a urls pair string into urls pair."""
    parts = urls.split(' ')
    return parts[0], parts[1]

# Loads in input file. It should be in format of:
#     URL         neighbor URL
#     URL         neighbor URL
#     URL         neighbor URL
#     ...

# The data file can be downloaded at http://www.cse.ust.hk/msbd5003/data/*
lines = sc.textFile("../data/pagerank_data.txt", 2)
# lines = sc.textFile("../data/dblp.in", 5)

numOfIterations = 10

# Loads all URLs from input file and initialize their neighbors. 
links = lines.map(lambda urls: parseNeighbors(urls)) \
             .groupByKey()

# Loads all URLs with other URL(s) link to from input file 
# and initialize rank

最低0.47元/天解锁文章

水母君98

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RDD用法与实例（十二）：实现PageRank

import refrom operator import adddef computeContribs(urls, rank): # Calculates URL contributions to the rank of other URLs. num_urls = len(urls) for url in urls: yield (url, rank / num_urls)def parseNeighbors(urls): # Parses a
复制链接

扫一扫

专栏目录