带你入门GeoSpark系列之二【Spatial RDD篇】

4 篇文章 1 订阅
3 篇文章 5 订阅

系列目录

带你入门GeoSpark系列之一【环境篇】
带你入门GeoSpark系列之二【Spatial RDD篇】
带你入门GeoSpark系列之三【空间查询篇】

1、基本地理数据概念

GeoSpark本质还是对地理要素进行操作,所以它支持了常用的一些地学几何图形。
几何图形中主要有三个要素:点,线,面。
横纵坐标构成点,多个点构成线,环线构成面,点线面混合构成几何集合。

对应的几个类为:
坐标:Coordinate
点:Point、MultiPoint
线:LineString、MultiLineString(多条线)、LinearRing(环线)
面:Polygon、MultiPolygon
集合:GeometryCollection

之后我们创建的RDD[T] 中的泛型T就是以上这些类

2、通过GeometryFactory创建地理数据

所有地理对象都是通过com.vividsolutions.jts.geom包下的GeometryFactory工厂类完成创建

package com.suddev.bigdata.core
import com.vividsolutions.jts.geom.{Coordinate, GeometryFactory}

object GeoDemoApp {
  def main(args: Array[String]): Unit = {
    // 创建一个坐标
    val coord = new Coordinate(-84.01, 34.01)
    // 实例化Geometry工厂类
    val factory = new GeometryFactory()
    // 创建Point
    val pointObject = factory.createPoint(coord)
    // 创建Polygon
    val coordinates = new Array[Coordinate](5)
    coordinates(0) = new Coordinate(0,0)
    coordinates(1) = new Coordinate(0,4)
    coordinates(2) = new Coordinate(4,4)
    coordinates(3) = new Coordinate(4,0)
    // 多边形是闭合的,所有最后一个点就是第一个点
    coordinates(4) = coordinates(0) 
    val polygonObject = factory.createPolygon(coordinates)
    // 创建LineString
    val coordinates2 = new Array[Coordinate](4)
    coordinates2(0) = new Coordinate(0,0)
    coordinates2(1) = new Coordinate(0,4)
    coordinates2(2) = new Coordinate(4,4)
    coordinates2(3) = new Coordinate(4,0)
    val linestringObject = factory.createLineString(coordinates2)
  }
}

3、创建SpatialRDD(SRDD)

GeoSpark-Core 提供了三种特殊的SpatialRDD: PointRDDPolygonRDDLineStringRDD
SRDD
⚠️注意: GeoSpark定义的SpatialRDD是对sparkRDD的进一步封装(并不是RDD的实现),原RDD被放到了SpatialRDD之内了
raw
它们可以从Spark RDD,CSV,TSV,WKT,WKB,Shapefiles,GeoJSON和NetCDF / HDF格式加载。
这里给出几种常用场景示例

3.1 初始化SparkContext

val conf = new SparkConf().
	 setAppName("GeoSparkDemo2").
	 setMaster("local[*]").
	 set("spark.serializer", classOf[KryoSerializer].getName).
	 set("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
val sc = new SparkContext(conf)

3.2 创建typed Spatial RDD

3.2.1 通过已有Spark RDD创建PointRDD

// 数据准备
val data = Array(
      (-88.331492,32.324142,"hotel"),
      (-88.175933,32.360763,"gas"),
      (-88.388954,32.357073,"bar"),
      (-88.221102,32.35078,"restaurant")
    )
val geometryFactory = new GeometryFactory()
// 创建Spark RDD[Point]
val pointsRowSpatialRDD = sc.parallelize(data)
      .map(x => {
      	// 创建坐标
        val coord = new Coordinate(x._1, x._2)
        // 用户定义数据
        val userData = x._3
        // 创建Point
        val point = geometryFactory.createPoint(coord)
        // Point支持携带用户数据
        point.setUserData(userData)
        point
       })
// 创建PointRDD 
val pointRDD = new PointRDD(pointsRowSpatialRDD)

3.2.2 通过CSV/TSV创建PointRDD

创建checkin.csvdata/checkin.csv路径下:

-88.331492,32.324142,hotel
-88.175933,32.360763,gas
-88.388954,32.357073,bar
-88.221102,32.35078,restaurant

checkin.csv一共有三列(Column IDs) 为 0, 1, 2.
第0,1 列是坐标
第2列是用户定义数据
pointRDDOffset 控制地理坐标从第几列开始,故offset=0

val pointRDDInputLocation = "data/checkin.csv"
val pointRDDOffset = 0  // The coordinates start from Column 0
val pointRDDSplitter = FileDataSplitter.CSV // or use  FileDataSplitter.TSV
val carryOtherAttributes = true // 支持携带用户定义数据 (hotel, gas, bar...)
var objectRDD = new PointRDD(sc, pointRDDInputLocation, pointRDDOffset, pointRDDSplitter, carryOtherAttributes)

3.2.3 通过CSV/TSV创建PolygonRDD/LineStringRDD

创建checkinshape.csvdata/checkin.csv路径下:

-88.331492,32.324142,-88.331492,32.324142,-88.331492,32.324142,-88.331492,32.324142,-88.331492,32.324142,hotel
-88.175933,32.360763,-88.175933,32.360763,-88.175933,32.360763,-88.175933,32.360763,-88.175933,32.360763,gas
-88.388954,32.357073,-88.388954,32.357073,-88.388954,32.357073,-88.388954,32.357073,-88.388954,32.357073,bar
-88.221102,32.35078,-88.221102,32.35078,-88.221102,32.35078,-88.221102,32.35078,-88.221102,32.35078,restaurant

checkinshape.csv一共有11列(Column IDs) 为 0~10
第0 - 9 列是5个坐标
第10列是用户定义数据
polygonRDDStartOffset 控制地理坐标从第几列开始,故StartOffset = 0
polygonRDDStartOffset 控制地理坐标从第几列结束,故EndOffset = 8

val polygonRDDInputLocation = "data/checkinshape.csv"
val polygonRDDStartOffset = 0 // The coordinates start from Column 0
val polygonRDDEndOffset = 8 // The coordinates end at Column 8
val polygonRDDSplitter = FileDataSplitter.CSV // or use  FileDataSplitter.TSV
val carryOtherAttributes = true
var objectRDD = new PolygonRDD(sc, polygonRDDInputLocation, polygonRDDStartOffset, polygonRDDEndOffset, polygonRDDSplitter, carryOtherAttributes)

3.3 创建通用Spatial RDD

通用SpatialRDD不同于PointRDDPolygonRDDLineStringRDD,它允许输入数据文件包含混合的几何类型,能够适用更多场景。
WKT/WKB/GeoJson/Shapefile等文件类型就可以支持保存多种地理数据如 LineString, PolygonMultiPolygon

3.3.1 通过WKT/WKB创建

checkin.tsv

POINT(-88.331492 32.324142)	hotel
POINT(-88.175933 32.360763)	gas
POINT(-88.388954 32.357073)	bar
POINT(-88.221102 32.35078)	restaurant

代码:

val inputLocation = "data/checkin.tsv"
val wktColumn = 0 // The WKT string starts from Column 0
val allowTopologyInvalidGeometries = true 
val skipSyntaxInvalidGeometries = false  
val spatialRDD = WktReader.readToGeometryRDD(sc, inputLocation, wktColumn, allowTopologyInvalidGeometries, skipSyntaxInvalidGeometries)

3.3.2 通过GeoJSON创建

polygon.json

{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "077", "TRACTCE": "011501", "BLKGRPCE": "5", "AFFGEOID": "1500000US010770115015", "GEOID": "010770115015", "NAME": "5", "LSAD": "BG", "ALAND": 6844991, "AWATER": 32636 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -87.621765, 34.873444 ], [ -87.617535, 34.873369 ], [ -87.6123, 34.873337 ], [ -87.604049, 34.873303 ], [ -87.604033, 34.872316 ], [ -87.60415, 34.867502 ], [ -87.604218, 34.865687 ], [ -87.604409, 34.858537 ], [ -87.604018, 34.851336 ], [ -87.603716, 34.844829 ], [ -87.603696, 34.844307 ], [ -87.603673, 34.841884 ], [ -87.60372, 34.841003 ], [ -87.603879, 34.838423 ], [ -87.603888, 34.837682 ], [ -87.603889, 34.83763 ], [ -87.613127, 34.833938 ], [ -87.616451, 34.832699 ], [ -87.621041, 34.831431 ], [ -87.621056, 34.831526 ], [ -87.62112, 34.831925 ], [ -87.621603, 34.8352 ], [ -87.62158, 34.836087 ], [ -87.621383, 34.84329 ], [ -87.621359, 34.844438 ], [ -87.62129, 34.846387 ], [ -87.62119, 34.85053 ], [ -87.62144, 34.865379 ], [ -87.621765, 34.873444 ] ] ] } },
{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "045", "TRACTCE": "021102", "BLKGRPCE": "4", "AFFGEOID": "1500000US010450211024", "GEOID": "010450211024", "NAME": "4", "LSAD": "BG", "ALAND": 11360854, "AWATER": 0 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -85.719017, 31.297901 ], [ -85.715626, 31.305203 ], [ -85.714271, 31.307096 ], [ -85.69999, 31.307552 ], [ -85.697419, 31.307951 ], [ -85.675603, 31.31218 ], [ -85.672733, 31.312876 ], [ -85.672275, 31.311977 ], [ -85.67145, 31.310988 ], [ -85.670622, 31.309524 ], [ -85.670729, 31.307622 ], [ -85.669876, 31.30666 ], [ -85.669796, 31.306224 ], [ -85.670356, 31.306178 ], [ -85.671664, 31.305583 ], [ -85.67177, 31.305299 ], [ -85.671878, 31.302764 ], [ -85.671344, 31.302123 ], [ -85.668276, 31.302076 ], [ -85.66566, 31.30093 ], [ -85.665687, 31.30022 ], [ -85.669183, 31.297677 ], [ -85.668703, 31.295638 ], [ -85.671985, 31.29314 ], [ -85.677177, 31.288211 ], [ -85.678452, 31.286376 ], [ -85.679236, 31.28285 ], [ -85.679195, 31.281426 ], [ -85.676865, 31.281049 ], [ -85.674661, 31.28008 ], [ -85.674377, 31.27935 ], [ -85.675714, 31.276882 ], [ -85.677938, 31.275168 ], [ -85.680348, 31.276814 ], [ -85.684032, 31.278848 ], [ -85.684387, 31.279082 ], [ -85.692398, 31.283499 ], [ -85.705032, 31.289718 ], [ -85.706755, 31.290476 ], [ -85.718102, 31.295204 ], [ -85.719132, 31.29689 ], [ -85.719017, 31.297901 ] ] ] } },
{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "055", "TRACTCE": "001300", "BLKGRPCE": "3", "AFFGEOID": "1500000US010550013003", "GEOID": "010550013003", "NAME": "3", "LSAD": "BG", "ALAND": 1378742, "AWATER": 247387 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -86.000685, 34.00537 ], [ -85.998837, 34.009768 ], [ -85.998012, 34.010398 ], [ -85.987865, 34.005426 ], [ -85.986656, 34.004552 ], [ -85.985, 34.002659 ], [ -85.98851, 34.001502 ], [ -85.987567, 33.999488 ], [ -85.988666, 33.99913 ], [ -85.992568, 33.999131 ], [ -85.993144, 33.999714 ], [ -85.994876, 33.995153 ], [ -85.998823, 33.989548 ], [ -85.999925, 33.994237 ], [ -86.000616, 34.000028 ], [ -86.000685, 34.00537 ] ] ] } },
{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "089", "TRACTCE": "001700", "BLKGRPCE": "2", "AFFGEOID": "1500000US010890017002", "GEOID": "010890017002", "NAME": "2", "LSAD": "BG", "ALAND": 1040641, "AWATER": 0 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -86.574172, 34.727375 ], [ -86.562684, 34.727131 ], [ -86.562797, 34.723865 ], [ -86.562957, 34.723168 ], [ -86.562336, 34.719766 ], [ -86.557381, 34.719143 ], [ -86.557352, 34.718322 ], [ -86.559921, 34.717363 ], [ -86.564827, 34.718513 ], [ -86.567582, 34.718565 ], [ -86.570572, 34.718577 ], [ -86.573618, 34.719377 ], [ -86.574172, 34.727375 ] ] ] } },

代码:

val inputLocation = "data/polygon.json"
val allowTopologyInvalidGeometries = true 
val skipSyntaxInvalidGeometries = false
val spatialRDD = GeoJsonReader.readToGeometryRDD(sc, inputLocation, allowTopologyInvalidGeometries, skipSyntaxInvalidGeometries)

3.3.3 通过Shapefile创建

val shapefileInputLocation="data/myshapefile"
// System.setProperty("geospark.global.charset", "utf8")
val spatialRDD = ShapefileReader.readToGeometryRDD(sc, shapefileInputLocation)

⚠️注意:
.shp, .shx, .dbf 文件后缀必须是小写. 并且 shapefile 文件必须命名为myShapefile, 文件夹结构如下:

- shapefile1
- shapefile2
- myshapefile
    - myshapefile.shp
    - myshapefile.shx
    - myshapefile.dbf
    - myshapefile...
    - ...

如果出现乱码问题可以在ShapefileReader.readToGeometryRDD方法调用之前设置编码参数

System.setProperty("geospark.global.charset", "utf8")

4、坐标系转换

GeoSpark采用EPGS标准坐标系,其坐标系也可参考EPSG官网:https://epsg.io/
如果需要转换成其他标准的坐标系,可以通过以下方法

// 源标准
val sourceCrsCode = "epsg:4326"
// 目标标准
val targetCrsCode = "epsg:3857"
objectRDD.CRSTransform(sourceCrsCode, targetCrsCode)

参考

https://datasystemslab.github.io/GeoSpark/tutorial/
https://www.cnblogs.com/denny402/p/4967049.html

  • 2
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 6
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值