带你入门GeoSpark系列之三【空间查询篇】

4 篇文章 1 订阅
3 篇文章 5 订阅

系列目录

带你入门GeoSpark系列之一【环境篇】
带你入门GeoSpark系列之二【Spatial RDD篇】
带你入门GeoSpark系列之三【空间查询篇】

1.空间范围查询( Spatial Range Query)

空间范围查询,顾名思义我们可以给定一个范围(query window),然后查询出包含在当前范围内的地理对象。
querywindow

1.1 数据准备

创建checkin1.csvdata/checkin1.csv路径下:
注意这里故意把bar坐标修改了一下

-88.331492,32.324142,hotel
-88.175933,32.360763,gas
-99.388954,32.357073,bar
-88.221102,32.35078,restaurant

1.2 代码示例

considerBoundaryIntersection参数可以配置查询是否包括query window边界上的地理对象。

package com.suddev.bigdata.query

import com.vividsolutions.jts.geom.Envelope
import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.{SparkConf, SparkContext}
import org.datasyslab.geospark.enums.FileDataSplitter
import org.datasyslab.geospark.serde.GeoSparkKryoRegistrator
import org.datasyslab.geospark.spatialOperator.RangeQuery
import org.datasyslab.geospark.spatialRDD.PointRDD

/**
 * Spatial Range Query
 * @author Rand
 * @date 2020/4/16 0016
 */
object SpatialRangeQueryApp {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().
      setAppName("SpatialRangeQueryApp").setMaster("local[*]").
      set("spark.serializer",classOf[KryoSerializer].getName).
      set("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
    implicit val sc = new SparkContext(conf)
    val objectRDD = createPointRDD
    objectRDD.rawSpatialRDD.rdd.collect().foreach(println)

    // 定义QueryWindow
    val rangeQueryWindow = new Envelope(-90.01, -80.01, 30.01, 40.01)
    // 是否考虑边界
    val considerBoundaryIntersection = false
    val usingIndex = false
    val queryResult = RangeQuery.SpatialRangeQuery(objectRDD, rangeQueryWindow, considerBoundaryIntersection, usingIndex)
    queryResult.rdd.collect().foreach(println)
  }

  def createPointRDD(implicit sc:SparkContext): PointRDD ={
    val pointRDDInputLocation = "data/checkin1.csv"
    // 这个变量控制我们的地理经度和纬度在数据的哪两列,我们这里是第0,1列,Offset就设置为0
    val pointRDDOffset = 0
    val pointRDDSplitter = FileDataSplitter.CSV
    // 这个参数允许我们除了经纬度外还可以携带其他自定义数据
    val carryOtherAttributes = true
    val objectRDD = new PointRDD(sc, pointRDDInputLocation,pointRDDOffset, pointRDDSplitter, carryOtherAttributes)
    objectRDD
  }
}

🔥这里的rangeQueryWindow除了支持Envelope外还可以使用Point/Polygon/LineString

点->创建一个Point Query Window:

val geometryFactory = new GeometryFactory()
val pointObject = geometryFactory.createPoint(new Coordinate(-84.01, 34.01))

多边形->创建一个Polygon Query Window:

val geometryFactory = new GeometryFactory()
val coordinates = new Array[Coordinate](5)
coordinates(0) = new Coordinate(0,0)
coordinates(1) = new Coordinate(0,4)
coordinates(2) = new Coordinate(4,4)
coordinates(3) = new Coordinate(4,0)
coordinates(4) = coordinates(0) // The last coordinate is the same as the first coordinate in order to compose a closed ring
val polygonObject = geometryFactory.createPolygon(coordinates)

线->创建一个Linestring Query Window:

val geometryFactory = new GeometryFactory()
val coordinates = new Array[Coordinate](5)
coordinates(0) = new Coordinate(0,0)
coordinates(1) = new Coordinate(0,4)
coordinates(2) = new Coordinate(4,4)
coordinates(3) = new Coordinate(4,0)
val linestringObject = geometryFactory.createLineString(coordinates)

1.3 运行效果

可以看到查询结果包含hotel,gas,restaurant不包含bar

POINT (-88.331492 32.324142)	hotel
POINT (-88.175933 32.360763)	gas
POINT (-99.388954 32.357073)	bar
POINT (-88.221102 32.35078)	restaurant
-------------------------------
POINT (-88.331492 32.324142)	hotel
POINT (-88.175933 32.360763)	gas
POINT (-88.221102 32.35078)	restaurant
-------------------------------

2.空间临近查询(Spatial KNN Query)

空间临近算法,我们可以给的一个中心点的坐标,然后找出该点相邻的K个地理对象

2.1 数据准备

创建checkin2.csvdata/checkin2.csv路径下:

-88.331492,32.324142,hotel
-88.175933,32.360763,gas1
-88.176033,32.360763,gas2
-88.175833,32.360763,gas3
-88.388954,32.357073,bar
-88.221102,32.35078,restaurant

2.2 代码示例

k参数可以设置限制查询k个结果
🙃这里吐槽一下,如果查询结果为5个,但是我们k设置的大于5就会报空指针异常hhh,不能查到多少返回多少么
🙃再吐槽一下,它这种设计一次只能查询一个点,实际生产上肯定是一批点和另外一批点做KNN匹配,而他这个不支持两个RDD查询,如果有感兴趣的两个RDD做KNN匹配的请给我留言,我单独写一篇文章

package com.suddev.bigdata.query

import com.vividsolutions.jts.geom.{Coordinate, Envelope, GeometryFactory}
import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.{SparkConf, SparkContext}
import org.datasyslab.geospark.enums.FileDataSplitter
import org.datasyslab.geospark.serde.GeoSparkKryoRegistrator
import org.datasyslab.geospark.spatialOperator.{KNNQuery, RangeQuery}
import org.datasyslab.geospark.spatialRDD.PointRDD
import scala.collection.JavaConversions._

/**
 * SpatialKNNQueryApp
 * @author Rand
 * @date 2020/4/16 0016
 */
object SpatialKNNQueryApp {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().
      setAppName("SpatialKNNQueryApp").setMaster("local[*]").
      set("spark.serializer",classOf[KryoSerializer].getName).
      set("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
    implicit val sc = new SparkContext(conf)
    val objectRDD = createPointRDD
    objectRDD.rawSpatialRDD.rdd.collect().foreach(println)
    val geometryFactory = new GeometryFactory()
    // 做临近查询的中心点
    val pointObject = geometryFactory.createPoint(new Coordinate(-84.01, 34.01))
    val K = 2 // K Nearest Neighbors
    val usingIndex = false
    val result = KNNQuery.SpatialKnnQuery(objectRDD, pointObject, K, usingIndex)
    println("-----------------------------------")
    // 记得import scala.collection.JavaConversions._ 否则这里报错哈
    result.foreach(println)
  }

  def createPointRDD(implicit sc:SparkContext): PointRDD ={
    val pointRDDInputLocation = "data/checkin2.csv"
    // 这个变量控制我们的地理经度和纬度在数据的哪两列,我们这里是第0,1列,Offset就设置为0
    val pointRDDOffset = 0
    val pointRDDSplitter = FileDataSplitter.CSV
    // 这个参数允许我们除了经纬度外还可以携带其他自定义数据
    val carryOtherAttributes = true
    val objectRDD = new PointRDD(sc, pointRDDInputLocation,pointRDDOffset, pointRDDSplitter, carryOtherAttributes)
    objectRDD
  }
}

2.3 运行效果

可以看到查询结果包含gas3,gas1两个点

POINT (-88.331492 32.324142)	hotel
POINT (-88.175933 32.360763)	gas1
POINT (-88.176033 32.360763)	gas2
POINT (-88.175833 32.360763)	gas3
POINT (-88.388954 32.357073)	bar
POINT (-88.221102 32.35078)	restaurant
-----------------------------------
POINT (-88.175833 32.360763)	gas3
POINT (-88.175933 32.360763)	gas1

3.空间连接查询(Spatial Join Query)

空间连接查询算法,类似于数据库中的Join操作, 有Spatial RDD A and B,遍历A中的几何对象去匹配B中覆盖或相交的几何对象。

3.1 数据准备

创建checkin3.csvdata/checkin3.csv路径下:

-88.331492,32.324142,1.hotel
-88.175933,32.360763,1.gas
-88.388954,32.357073,1.bar
-88.588954,32.357073,1.spark

创建checkin4.csvdata/checkin4.csv路径下:

-88.175933,32.360763,2.gas
-88.388954,32.357073,2.bar
-88.221102,32.35078,2.restaurant
-88.321102,32.35078,2.bus

3.2 代码示例

package com.suddev.bigdata.query

import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.{SparkConf, SparkContext}
import org.datasyslab.geospark.enums.{FileDataSplitter, GridType}
import org.datasyslab.geospark.serde.GeoSparkKryoRegistrator
import org.datasyslab.geospark.spatialOperator.JoinQuery
import org.datasyslab.geospark.spatialRDD.PointRDD
/**
 * SpatialJoinQueryApp
 *
 * @author Rand
 * @date 2020/4/16 0016
 */
object SpatialJoinQueryApp {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().
      setAppName("SpatialJoinQueryApp").setMaster("local[*]").
      set("spark.serializer",classOf[KryoSerializer].getName).
      set("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
    implicit val sc = new SparkContext(conf)
    // 准备数据
    val objectRDD = createObjectRDDRDD
    objectRDD.rawSpatialRDD.rdd.collect().foreach(println)
    val queryWindowRDD = createQueryWindowRDD
    println("---------------------------")
    queryWindowRDD.rawSpatialRDD.rdd.collect().foreach(println)
    println("---------------------------")
    objectRDD.analyze()
    // 必须设置objectRDD和queryWindowRDD的spatialPartitioning
    // 条件有二
    // 1.objectRDD和queryWindowRDD的spatialPartitioning 必须非空相同
    // 2.objectRDD和queryWindowRDD的分区数量必须一样
    objectRDD.spatialPartitioning(GridType.KDBTREE)
    queryWindowRDD.spatialPartitioning(objectRDD.getPartitioner)
    val considerBoundaryIntersection = false
    val usingIndex = false
    val result = JoinQuery.SpatialJoinQuery(objectRDD, queryWindowRDD, usingIndex, considerBoundaryIntersection)
    result.rdd.foreach(println)
  }

  def createObjectRDDRDD(implicit sc:SparkContext): PointRDD ={
    val pointRDDInputLocation = "data/checkin3.csv"
    val pointRDDOffset = 0
    val pointRDDSplitter = FileDataSplitter.CSV
    val carryOtherAttributes = true
    val objectRDD = new PointRDD(sc, pointRDDInputLocation,pointRDDOffset, pointRDDSplitter, carryOtherAttributes)
    objectRDD
  }

  def createQueryWindowRDD(implicit sc:SparkContext): PointRDD ={
    val pointRDDInputLocation = "data/checkin4.csv"
    val pointRDDOffset = 0
    val pointRDDSplitter = FileDataSplitter.CSV
    val carryOtherAttributes = true
    val objectRDD = new PointRDD(sc, pointRDDInputLocation,pointRDDOffset, pointRDDSplitter, carryOtherAttributes)
    objectRDD
  }
}

3.3 运行效果

可以看到两边的gas,barJoin关联上了

POINT (-88.331492 32.324142)	1.hotel
POINT (-88.175933 32.360763)	1.gas
POINT (-88.388954 32.357073)	1.bar
POINT (-88.588954 32.357073)	1.spark
---------------------------
POINT (-88.175933 32.360763)	2.gas
POINT (-88.388954 32.357073)	2.bar
POINT (-88.221102 32.35078)	2.restaurant
POINT (-88.321102 32.35078)	2.bus
---------------------------
(POINT (-88.175933 32.360763)	2.gas,[POINT (-88.175933 32.360763)	1.gas])
(POINT (-88.388954 32.357073)	2.bar,[POINT (-88.388954 32.357073)	1.bar])

4.距离连接查询(Distance Join Query)

距离联接查询将两个Spatial RDD A和B和一个距离作为输入。对于A中的每个几何对象,找到B中都在给定距离之内的集合对象。
⚠️关于距离说明:
GeoSpark不会控制SpatialRDD中所有几何的坐标单位(基于度或基于米)。GeoSpark中所有相关距离的单位与SpatialRDD中所有几何的单位()相同。
转换参考坐标系(Coordinate Reference System)代码:

val sourceCrsCode = "epsg:4326" // WGS84, the most common degree-based CRS
val targetCrsCode = "epsg:3857" // The most common meter-based CRS
objectRDD.CRSTransform(sourceCrsCode, targetCrsCode)

参考资料:
GIS基础知识 - 坐标系、投影、EPSG:4326、EPSG:3857

4.1 数据准备

创建checkin5.csvdata/checkin5.csv路径下:

-89.331492,32.324142,1.hotel
-88.1760,32.360763,1.gas
-88.3890,32.357073,1.bar
-89.588954,32.357073,1.spark

创建checkin6.csvdata/checkin6.csv路径下:

-88.175933,32.360763,2.gas
-88.388954,32.357073,2.bar
-88.221102,32.35078,2.restaurant
-88.321102,32.35078,2.bus

4.2 代码示例

package com.suddev.bigdata.query

import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.{SparkConf, SparkContext}
import org.datasyslab.geospark.enums.{FileDataSplitter, GridType}
import org.datasyslab.geospark.serde.GeoSparkKryoRegistrator
import org.datasyslab.geospark.spatialOperator.JoinQuery
import org.datasyslab.geospark.spatialRDD.{CircleRDD, PointRDD}

/**
 * DistanceJoinQueryApp
 *
 * @author Rand
 * @date 2020/4/16 0016
 */
object DistanceJoinQueryApp {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().
      setAppName("DistanceJoinQueryApp$").setMaster("local[*]").
      set("spark.serializer",classOf[KryoSerializer].getName).
      set("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
    implicit val sc = new SparkContext(conf)
    // 准备数据
    val objectRddA = createObjectRDDA
    objectRddA.rawSpatialRDD.rdd.collect().foreach(println)
    val objectRddB = createObjectRDDB
    println("---------------------------")
    objectRddB.rawSpatialRDD.rdd.collect().foreach(println)
    println("---------------------------")
    // 设置距离
    val circleRDD = new CircleRDD(objectRddA, 0.1) // Create a CircleRDD using the given distance
    circleRDD.analyze()
    circleRDD.spatialPartitioning(GridType.KDBTREE)
    objectRddB.spatialPartitioning(circleRDD.getPartitioner)

    val considerBoundaryIntersection = false // Only return gemeotries fully covered by each query window in queryWindowRDD
    val usingIndex = false

    val result = JoinQuery.DistanceJoinQueryFlat(objectRddB, circleRDD, usingIndex, considerBoundaryIntersection)
    result.rdd.foreach(println)
  }

  def createObjectRDDA(implicit sc:SparkContext): PointRDD ={
    val pointRDDInputLocation = "data/checkin5.csv"
    val pointRDDOffset = 0
    val pointRDDSplitter = FileDataSplitter.CSV
    val carryOtherAttributes = true
    val objectRDD = new PointRDD(sc, pointRDDInputLocation,pointRDDOffset, pointRDDSplitter, carryOtherAttributes)
    objectRDD
  }

  def createObjectRDDB(implicit sc:SparkContext): PointRDD ={
    val pointRDDInputLocation = "data/checkin6.csv"
    val pointRDDOffset = 0
    val pointRDDSplitter = FileDataSplitter.CSV
    val carryOtherAttributes = true
    val objectRDD = new PointRDD(sc, pointRDDInputLocation,pointRDDOffset, pointRDDSplitter, carryOtherAttributes)
    objectRDD
  }
}

4.3 运行效果

可以看到1.gas匹配到了2.gas,2.restaurant两个点
1.bar匹配到了2.bar,2.bus两个点

POINT (-89.331492 32.324142)	1.hotel
POINT (-88.176 32.360763)	1.gas
POINT (-88.389 32.357073)	1.bar
POINT (-89.588954 32.357073)	1.spark
---------------------------
POINT (-88.175933 32.360763)	2.gas
POINT (-88.388954 32.357073)	2.bar
POINT (-88.221102 32.35078)	2.restaurant
POINT (-88.321102 32.35078)	2.bus
---------------------------
(POINT (-88.176 32.360763)	1.gas,POINT (-88.175933 32.360763)	2.gas)
(POINT (-88.176 32.360763)	1.gas,POINT (-88.221102 32.35078)	2.restaurant)
(POINT (-88.389 32.357073)	1.bar,POINT (-88.388954 32.357073)	2.bar)
(POINT (-88.389 32.357073)	1.bar,POINT (-88.321102 32.35078)	2.bus)
  • 5
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 9
    评论
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值