基于交通数据的可达性分析

业务场景:交通流中车辆的行驶轨迹可以用来描述城市交通网中某两点间的可达性.该例以城市治安卡口过车数据为依据,进行数据清洗和处理,形成整个城市交通网治安卡点间的可达性矩阵,基于此可进行进一步的城市交通状态分析.

 

处理流程:

Hadoop:处理原始过车数据,形成以单个车辆时间序列顺序生成的数据流示例[(kkid1,1)(kkid2)…],该过程中需要对数据的合法性进行检查主要体现在排除A-A的情况,排除A-B 间隔很大的情况,此处需要提供一份完整的卡口ID集合,如没有可使用独立的MR程序生成一份.

Spark:加载Hadoop生成的数据,基于定点集合,边集借助Spark的Graph框架构建图模型.进而生成城市治安卡口点位间的可达性矩阵.

 

Hadoop流程:

CarGraphEdge.java ,MR驱动器,此处借用Hadoop提供的Tool,ToolRunner工具类简化命令行方式运行作业.

public class CarGraphEdge extends Configuredimplements Tool {

         publicstatic final String GRANULARITY = "GRANULARITY";

         publicint run(String[] args) throws Exception {

                   Pathinput = new Path(args[0]);

                   Pathoutput = new Path(args[1]);

                   Configurationconf = new Configuration();

                   if(args.length >= 3) {

                            conf.set(CarGraphEdge.GRANULARITY,args[3]);

                   }

                   Jobjob = Job.getInstance(conf, "CAR_GRAPH_EDGE");

                   job.setJarByClass(cn.com.zjf.MR_04.CarGraphEdge.class);

                   job.setInputFormatClass(CombineTextInputFormat.class);

                   job.setMapperClass(GraphMapper.class);

                   job.setCombinerClass(GraphConbine.class);

                   job.setReducerClass(GraphReduce.class);

                   //数据分区

                   job.setPartitionerClass(GraphPartion.class);

                   //数据分组这里没有必要

                   //job.setCombinerKeyGroupingComparatorClass(GraphComparator.class);

                   job.setMapOutputKeyClass(GrapOrderMap.class);

                   job.setMapOutputValueClass(GraphValue.class);

                   job.setOutputKeyClass(NullWritable.class);

                   job.setOutputValueClass(NullWritable.class);

                   FileSystemfs = FileSystem.get(conf);

                   //预处理文件 .只读取写完毕的文件 .writed结尾 .只读取文件大小大于0的文件

                  {

                            FileStatuschilds[] = fs.globStatus(input, new PathFilter() {

                                     publicboolean accept(Path path) {

                                               if(path.toString().endsWith(".writed")) {

                                                        returntrue;

                                               }

                                               returnfalse;

                                     }

                            });

                            Pathtemp = null;

                            for(FileStatus file : childs) {

                                     temp= new Path(file.getPath().toString().replaceAll(".writed",""));

                                     if(fs.listStatus(temp)[0].getLen() > 0) {

                                               CombineTextInputFormat.addInputPath(job,temp);

                                     }

                            }

                   }

                   CombineTextInputFormat.setMaxInputSplitSize(job,67108864);

 

                   if(fs.exists(output)) {

                            fs.delete(output,true);

                   }

                   FileOutputFormat.setOutputPath(job,output);

                   if(!job.waitForCompletion(true))

                            return0;

 

                   return-1;

         }

         publicstatic void main(String[] args) throws Exception {

                   ToolRunner.run(newCarGraphEdge(), args);

         }

}


GrapOrderMap.java 组合键,用于对以车辆数据分区的分区内数据按时间序列排序

// 组合键定义

class GrapOrderMap implements Writable,WritableComparable<GrapOrderMap> {

         privateText carPlate;

         privateLong day;

 

         publicLong getDay() {

                   returnday;

         }

 

         publicvoid setDay(Long day) {

                   this.day= day;

         }

 

         publicGrapOrderMap() {

                   carPlate= new Text();

                   day= 0L;

         }

 

         publicGrapOrderMap(Text carPlate, Long day) {

                   super();

                   this.carPlate= carPlate;

                   this.day= day;

         }

 

         publicint compareTo(GrapOrderMap co) {

                   intcompareValue = this.carPlate.compareTo(co.carPlate);

                   //相等

                   if(compareValue == 0) {

                            compareValue= this.day.compareTo(co.day);

                   }

                   returncompareValue;

         }

 

         publicvoid write(DataOutput out) throws IOException {

                   this.carPlate.write(out);

                   out.writeLong(day);

         }

 

         publicvoid readFields(DataInput in) throws IOException {

                   this.carPlate.readFields(in);

                   day= in.readLong();

         }

 

         publicText getCarPlate() {

                   returncarPlate;

         }

 

         publicvoid setCarPlate(Text carPlate) {

                   this.carPlate= carPlate;

         }

 

         @Override

         publicString toString() {

                   return"CarOrder [carPlate=" + carPlate + ", day=" + day +"]";

         }

}


GraphValue.java 组合Value,由于在Reduce过程中需要对时间序列的数据进行时间粒度清洗,没条数据都应该携带原始的过车时间,组合键中的时间值仅仅用来排序.

class GraphValue implements Writable,Comparable<GraphValue> {

 

         privateString kkid;

         privateLong time;

 

         publicGraphValue() {

                   kkid= "";

                   time= 0L;

         }

 

         publicGraphValue(String kkid, Long time) {

                   this.kkid= kkid;

                   this.time= time;

         }

 

         publicvoid write(DataOutput out) throws IOException {

                   out.writeUTF(kkid);

                   out.writeLong(time);

         }

 

         publicvoid readFields(DataInput in) throws IOException {

                   kkid= in.readUTF();

                   time= in.readLong();

         }

 

         publicString getKkid() {

                   returnkkid;

         }

 

         publicvoid setKkid(String kkid) {

                   this.kkid= kkid;

         }

 

         publicLong getTime() {

                   returntime;

         }

 

         publicvoid setTime(Long time) {

                   this.time= time;

         }

 

         publicint compareTo(GraphValue o) {

                   returnthis.kkid.compareTo(o.getKkid());

         }

 

         @Override

         publicString toString() {

                   return"GraphValue [kkid=" + kkid + ", time=" + time +"]";

         }

}

GraphPartion.java ,分区函数,数据按车牌进行分区.这里有一个疑问,城市车辆多则近百万意味着数据分区数会很多,暂时未考虑分区数过多可能会带来的负面影响.

class GraphPartion extendsPartitioner<GrapOrderMap, GraphValue> {

         @Override

         publicint getPartition(GrapOrderMap key, GraphValue value, int numPartitions) {

                   returnkey.getCarPlate().hashCode() % numPartitions;

         }

}


GraphConbine.java ,Combine 函数,在Mapper端进行数据预处理操作,这里主要处理掉时间序列上出现的点位,进行点位去重

// 数据去重

class GraphConbine extends Reducer<GrapOrderMap,GraphValue, GrapOrderMap, GraphValue> {

         @Override

         protectedvoid setup(Reducer<GrapOrderMap, GraphValue, GrapOrderMap,GraphValue>.Context context)

                            throwsIOException, InterruptedException {

         }

 

         @Override

         protectedvoid reduce(GrapOrderMap key, Iterable<GraphValue> values,

                            Reducer<GrapOrderMap,GraphValue, GrapOrderMap, GraphValue>.Context context)

                            throwsIOException, InterruptedException {

                   //去除连续空间

                   List<GraphValue>graphValues = new ArrayList<GraphValue>();

                   for(GraphValue value : values) {

                            graphValues.add(newGraphValue(value.getKkid(), value.getTime()));

                   }

                   GraphValuepre = null;

                   for(GraphValue value : graphValues) {

                            if(pre == null) {

                                     pre= value;

                                     //纠正时间

                                     key.setDay(value.getTime());

                                     context.write(key,value);

                                     continue;

                            }

                            //不相同输出

                            if(!pre.getKkid().equals(value.getKkid())) {

                                     context.write(key,value);

                            }

                            //must

                            pre.setKkid(value.getKkid());

                            pre.setTime(value.getTime());

                   }

         }

}


GraphReduce.java ,由于数据原始数据集可能跨度比较大,如造成上班时间出现的最后一个点位和下午下班的出现的第一个点位的可达性应该去掉,Reduce阶段主要进行时间粒度的清洗,粒度由MR框架传入,最后输出格式: (开始点位_结束点位  1) 此处1标识出现一次. 出现的次数在Spark构建图过程中用来标识该点位的车流量

// 时间粒度清洗

class GraphReduce extends Reducer<GrapOrderMap,GraphValue, NullWritable, Text> {

         privateInteger granularity;

         privateQueue<GraphValue> queue = null;

 

         @Override

         protectedvoid setup(Reducer<GrapOrderMap, GraphValue, NullWritable, Text>.Contextcontext)

                            throwsIOException, InterruptedException {

                   queue= new ArrayBlockingQueue<GraphValue>(2);

                   granularity= context.getConfiguration().getInt(CarGraphEdge.GRANULARITY, 30);

         }

 

         @Override

         protectedvoid reduce(GrapOrderMap key, Iterable<GraphValue> values,

                            Reducer<GrapOrderMap,GraphValue, NullWritable, Text>.Context context)

                            throwsIOException, InterruptedException {

 

                   GraphValuetemp = null;

                   for(GraphValue gh : values) {

                            queue.add(newGraphValue(gh.getKkid(), gh.getTime()));

                            if(queue.size() >= 2) {

                                     //重复节点

                                     if(queue.peek().getKkid().equals(gh.getKkid())) {

                                               queue.poll();

                                     }else {

                                               //粒度清洗超过N分钟无效

                                               temp= queue.poll();

                                               if(queue.peek().getTime() - temp.getTime() < 1000 * 60 * granularity) {

                                                        context.write(NullWritable.get(),

                                                                           newText(temp.getKkid() + "_" + queue.peek().getKkid() +"\t1"));

                                               }

                                     }

                                     while(queue.size() > 1) {

                                               queue.poll();

                                     }

                            }

                   }

         }

}


SPARK阶段: 加载定点集合,加载边集合,对边集合预处理,使用Spark图处理框架构建图对象,此处主要体现点位间的可达性,其它算法后续章节将逐渐体现,学习中,具体见代码注释

App.scala

. object App extends App {

 

  val conf =new SparkConf

 conf.setAppName("TVC_GRAPH").setMaster("local[4]")

  val sc = newSparkContext(conf)

  //点集

  val vertices= sc.textFile("hdfs://host218:8020/zjf/output6/part-r-00000",3).filter { item => !item.equals("") &&item.matches("[0-9]+?") }.map { item => (item.toLong, item) }

  //边集

  val edges =sc.textFile("hdfs://host218:8020/zjf/output7/part-r-00000", 3).filter{ item => !item.equals("") &&item.matches("[0-9]+?_[0-9]+?\t[0-9]+?") }.map { item =>

    {

      val args1= item.split("\t");

      val args2= args1(0).split("_");

     ((args2(0).toLong, args2(1).toLong), 1)

    }

 }.reduceByKey((it1, it2) => it1 + it2).map(item =>Edge(item._1._1, item._1._2, item._2))

  // 构建图

  val graph:Graph[String, Int] = Graph(vertices, edges, "")

  val maxCount= graph.edges.reduce((item1, item2) => {

    if(item1.attr > item2.attr) { item1 } else { item2 }

  })

  //顶点数组

  val ss =vertices.map(item => { item._1 }).collect();

  //边元组

  val zz =edges.map { item => (item.srcId.toLong, item.dstId.toLong) }.collect()

  var arrs =Array.ofDim[Long](ss.length, ss.length);

  for (i <-0 until ss.length) {

    for (j <-0 until ss.length) {

      if(zz.exists(item => item._1 == ss(i) && item._2 == ss(j))) {

       arrs(i)(j) = 1;

      }

    }

  }

  /*交通流可达性矩阵*/

  ss.foreach {item => print("\t" + item) }

  println()

  for (i <-0 until ss.length) {

    print(ss(i)+ "\t")

    for (j<- 0 until ss.length) {

     print(arrs(i)(j) + "\t")

    }

    println()

  }

}


可达性矩阵运算结果(部分):

点位id

1001

1003

1004

1006

1007

1008

1009

1010

1001

0

0

0

0

0

0

0

0

 

1003

0

0

0

0

0

0

0

0

 

1004

0

0

0

0

1

0

1

0

 

1006

0

0

0

0

0

0

0

0

 

1007

0

0

1

0

0

0

1

0

 

1008

0

0

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值