常用图算法实现--Flink

最新推荐文章于 2024-06-16 09:50:01 发布

zealscott

最新推荐文章于 2024-06-16 09:50:01 发布

阅读量1.9k

点赞数 1

分类专栏： DistributionSystem 文章标签： spark graph

本文链接：https://blog.csdn.net/crazy_scott/article/details/85677499

版权

使用Flink实现PageRank、强连通分量、单源最短路径、二分图匹配…PageRank主要参考官网的example算法流程每次计算当前每个网页的转移概率，计算下一时刻到达每个网页的概率并加入随机跳转数据准备pages.txt准备一些顶点，例如1-15links.txt准备一些连接边（也就是链接数）：1 21 152 32 42 52 62 73 134 2...

摘要由CSDN通过智能技术生成

使用Flink实现PageRank、强连通分量、单源最短路径、二分图匹配…

PageRank

主要参考官网的example

算法流程

每次计算当前每个网页的转移概率，计算下一时刻到达每个网页的概率并加入随机跳转

数据准备

pages.txt

准备一些顶点，例如1-15

links.txt

准备一些连接边（也就是链接数）：

PageRank.java

@SuppressWarnings("serial")
public class PageRank {
   

    private static final double DAMPENING_FACTOR = 0.85;
    private static final double EPSILON = 0.0001;

    // *************************************************************************
    //     PROGRAM
    // *************************************************************************

    public static void main(String[] args) throws Exception {
   

        ParameterTool params = ParameterTool.fromArgs(args);

        final int numPages = params.getInt("numPages", PageRankData.getNumberOfPages());
        final int maxIterations = params.getInt("iterations", 10);

        // set up execution environment
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        // make the parameters available to the web ui
        env.getConfig().setGlobalJobParameters(params);

        // get input data
        DataSet<Long> pagesInput = getPagesDataSet(env, params);
        DataSet<Tuple2<Long, Long>> linksInput = getLinksDataSet(env, params);

        // assign initial rank to pages pi = ([1,1/n] ,... [n,1/n])
        DataSet<Tuple2<Long, Double>> pagesWithRanks = pagesInput.
                map(new RankAssigner((1.0d / numPages)));

        // build adjacency list from link input (1,[2,3,5])...
        DataSet<Tuple2<Long, Long[]>> adjacencyListInput =
                linksInput.groupBy(0).reduceGroup(new BuildOutgoingEdgeList());

        // set iterative data set
        IterativeDataSet<Tuple2<Long, Double>> iteration = pagesWithRanks.iterate(maxIterations);

        DataSet<Tuple2<Long, Double>> newRanks = iteration
                // join pages with outgoing edges and distribute rank [1,1/n] join 1,[1,3,5] => [1,1/3n],[3,1/3n],[5,1/3n]
                .join(adjacencyListInput).where(0).equalTo(0).flatMap(new JoinVertexWithEdgesMatch())
                // collect and sum ranks
                .groupBy(0).aggregate(SUM, 1)
                // apply dampening factor choosing stay or leave
                .map(new Dampener(DAMPENING_FACTOR, numPages));

        DataSet<Tuple2<Long, Double>> finalPageRanks = iteration.closeWith(
                newRanks,
                newRanks.join(iteration).where(0).equalTo(0)
                        // termination condition
                        .filter(new EpsilonFilter()));

        // emit result
        if (params.has("output")) {
   
            finalPageRanks.writeAsCsv(params.get("output"), "\n", " ");
            // execute program
            env.execute("Basic Page Rank Example");
        } else {
   
            System.out.println("Printing result to stdout. Use --output to specify output path.");
            finalPageRanks.print();
        }
    }

    // *************************************************************************
    //     USER FUNCTIONS
    // *************************************************************************

    /**
     * A map function that assigns an initial rank to all pages.
     */
    public static final class RankAssigner implements MapFunction<Long, Tuple2<Long, Double>> {
   
        Tuple2<Long, Double> outPageWithRank;

        public RankAssigner(double rank) {
   
            this.outPageWithRank = new Tuple2<Long, Double>(-1L, rank);
        }

        @Override
        public Tuple2<Long, Double> map(Long page) {
   
            outPageWithRank.f0 = page;
            return outPageWithRank;
        }
    }

    /**
     * A reduce function that takes a sequence of edges and builds the adjacency list for the vertex where the edges
     * originate. Run as a pre-processing step.
     */
    @ForwardedFields("0")
    public static final class BuildOutgoingEdgeList implements GroupReduceFunction<Tuple2<Long, Long>, Tuple2<Long, Long[]>> {
   

        private final ArrayList<Long> neighbors = new ArrayList<Long>();

        @Override
        public void reduce(Iterable<Tuple2<Long, Long>> values, Collector<Tuple2<Long, Long[]>> out) {
   
            neighbors.clear();
            Long id = 0L;

            for (Tuple2<Long, Long> n : values) {
   
                id = n.f0;
                neighbors.add(n.f1);
            }
            out.collect(new Tuple2<Long, Long[]>(id, neighbors.toArray(new Long[neighbors.size(<

最低0.47元/天解锁文章

zealscott

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
常用图算法实现--Flink

使用Flink实现PageRank、强连通分量、单源最短路径、二分图匹配…PageRank主要参考官网的example算法流程每次计算当前每个网页的转移概率，计算下一时刻到达每个网页的概率并加入随机跳转数据准备pages.txt准备一些顶点，例如1-15links.txt准备一些连接边（也就是链接数）：1 21 152 32 42 52 62 73 134 2...
复制链接

扫一扫

专栏目录