前言
- 注意:如果你没先看下我的第一篇文章-导读,建议还是先看下(导读)
- 目标:使用geotrellis3.5.2中的pipeline模块进行单波段/多波段栅格数据切片存放在本地磁盘/HDFS
- 抢先体验:
步骤
1.概述
- pipeline
使用json描述一组指令,这一组指令就是处理切片的一个流程。然后执行这一组指令,每个指令的输出都是下一个指令的输入。
例如:读取-》tile to layout->重投影-》建立金字塔-》写入
2.完整的pom文件
目前只是基本的一些库,到后面有可能会增加。增加的后面文章中绘提及,在这个基础上增加就好。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.3.9.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.tiger</groupId>
<artifactId>geotrellis3</artifactId>
<version>0.0.1-SNAPSHOT</version>
<!--项目名-->
<name>geotrellis3</name>
<description>geotrellis3 project for Spring Boot</description>
<!--版本集中控制-->
<properties>
<java.version>1.8</java.version>
<geotrellis.version>3.5.2</geotrellis.version>
<scala.version>2.11</scala.version>
<spark.version>2.4.0</spark.version>
</properties>
<!--依赖-->
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!--本项目日志框架使用springboot默认的 slf4j+logback 所以这里将依赖的库中使用log4j的依赖库都转换成slf4j-->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</dependency>
<!--配置文件中自定义属性提示处理器,只是在编码阶段有用-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Swagger 依赖 使用升级版 knife4j-->
<dependency>
<groupId>com.github.xiaoymin</groupId>
<artifactId>knife4j-spring-boot-starter</artifactId>
<version>3.0.2</version>
</dependency>
<!--lombok 简化代码-->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<scope>provided</scope>
</dependency>
<!--geotrellis相关 目前就先引入这些-->
<dependency>
<groupId>org.locationtech.geotrellis</groupId>
<artifactId>geotrellis-raster_${scala.version}</artifactId>
<version>${geotrellis.version}</version>
</dependency>
<dependency>
<groupId>org.locationtech.geotrellis</groupId>
<artifactId>geotrellis-spark_${scala.version}</artifactId>
<version>${geotrellis.version}</version>
</dependency>
<dependency>
<groupId>org.locationtech.geotrellis</groupId>
<artifactId>geotrellis-proj4_${scala.version}</artifactId>
<version>${geotrellis.version}</version>
</dependency>
<dependency>
<groupId>org.locationtech.geotrellis</groupId>
<artifactId>geotrellis-spark-pipeline_${scala.version}</artifactId>
<version>${geotrellis.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--scala和java的转换,在使用geotrellis3版本的时候需要这个依赖要不然会报错-->
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-java8-compat_2.11</artifactId>
<version>1.0.0-RC1</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<!--编译scala-->
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<!--打包-->
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
3.数据源
单波段数据源(福建省范围pm2.5数据):
多波段数据(福建省内某一块区域的真彩色遥感影像):
4.编码
概述
切片将会提供4个方法,分别是:
- 执行单波段栅格数据切片,json形式
- 执行多波段栅格数据切片,json形式
- 执行单波段栅格数据切片,编码的形式
- 执行多波段栅格数据切片,编码的形式
其中编码形式其实它只是把描述信息封装成对象,执行的之后还是把对象实例转换成json进行执行,只是对我们的写法不一样而已。
代码文件位置
功能代码
单例SparkContext
因为SparkContext不允许存在多个,所以以后要用到SparkContext的地方,都从这里拿就好。这样保证一个jvm中只有一个SparkContext实例。至于详细内容大家可以自行查阅,这里不做详细介绍。
package com.tiger.geotrellis.common
import geotrellis.spark.util.SparkUtils
import org.apache.spark.{SparkConf, SparkContext}
/**
* 单例 SparkContext
*/
object SparkContextSingleton {
val conf: SparkConf =
new SparkConf()
.setMaster("local[*]")
private implicit val sc: SparkContext = SparkUtils.createSparkContext("GeotrellisJob", conf)
def getInstance: SparkContext = sc
}
切片代码
package com.tiger.geotrellis.common.tile
import com.tiger.geotrellis.common.SparkContextSingleton
import geotrellis.layer.{LayoutDefinition, LayoutScheme, SpatialKey, ZoomedLayoutScheme}
import geotrellis.proj4.WebMercator
import geotrellis.spark._
import geotrellis.spark.pipeline._
import geotrellis.spark.pipeline.ast._
import geotrellis.spark.pipeline.ast.untyped.ErasedNode
import geotrellis.spark.pipeline.json.read.JsonRead
import geotrellis.spark.pipeline.json.{PipelineExpr, PipelineKeyIndexMethod, ReadTypes, TransformTypes, WriteTypes}
import geotrellis.spark.pipeline.json.transform.{Pyramid, Reproject, TileToLayout}
import geotrellis.spark.pipeline.json.write.JsonWrite
import org.apache.spark.SparkContext
import scala.util.{Failure, Success, Try}
/**
* 栅格数据切片相关方法
*/
object TileTiff {
//获取SparkContext单例对象
implicit val sc: SparkContext = SparkContextSingleton.getInstance
/**
* 执行单波段栅格数据切片,json形式,步骤包括 读取-》tile to layout->重投影-》建立金字塔-》写入
*
* @param input 数据源
* @param output 存储位置
* @param layerName 图层名
*/
def createSinglebandTileJson(input: String, output: String, layerName: String): Unit = {
val maskJson: String =
"""
|[
| {
| "uri" : "{input}",
| "type" : "singleband.spatial.read.hadoop"
| },
| {
| "resample_method" : "nearest-neighbor",
| "type" : "singleband.spatial.transform.tile-to-layout"
| },
| {
| "crs" : "EPSG:3857",
| "scheme" : {
| "crs" : "epsg:3857",
| "tileSize" : 256,
| "resolutionThreshold" : 0.1
| },
| "resample_method" : "nearest-neighbor",
| "type" : "singleband.spatial.transform.buffered-reproject"
| },
| {
| "end_zoom" : 0,
| "resample_method" : "nearest-neighbor",
| "type" : "singleband.spatial.transform.pyramid"
| },
| {
| "name" : "{layerName}",
| "uri" : "{output}",
| "key_index_method" : {
| "type" : "zorder"
| },
| "scheme" : {
| "crs" : "epsg:3857",
| "tileSize" : 256,
| "resolutionThreshold" : 0.1
| },
| "type" : "singleband.spatial.write"
| }
|]
""".stripMargin
val maskJsonStr = maskJson.replace("{input}", input).replace("{output}", output).replace("{layerName}", layerName)
val list: Option[Node[Stream[(Int, TileLayerRDD[SpatialKey])]]] = maskJsonStr.node
list match {
case None => println("Couldn't parse the JSON")
case Some(node) => {
node.eval.foreach { case (zoom, rdd) =>
println(s"ZOOM: ${zoom}")
println(s"COUNT: ${rdd.count}")
}
}
}
}
/**
* 执行多波段栅格数据切片,json形式,步骤包括 读取-》tile to layout->重投影-》建立金字塔-》写入
*
* @param input 数据源
* @param output 存储位置
* @param layerName 图层名
*/
def createMultibandTileJson(input: String, output: String, layerName: String): Unit = {
val maskJson: String =
"""
|[
| {
| "uri" : "{input}",
| "type" : "multiband.spatial.read.hadoop"
| },
| {
| "resample_method" : "nearest-neighbor",
| "type" : "multiband.spatial.transform.tile-to-layout"
| },
| {
| "crs" : "EPSG:3857",
| "scheme" : {
| "crs" : "epsg:3857",
| "tileSize" : 256,
| "resolutionThreshold" : 0.1
| },
| "resample_method" : "nearest-neighbor",
| "type" : "multiband.spatial.transform.buffered-reproject"
| },
| {
| "end_zoom" : 0,
| "resample_method" : "nearest-neighbor",
| "type" : "multiband.spatial.transform.pyramid"
| },
| {
| "name" : "{layerName}",
| "uri" : "{output}",
| "key_index_method" : {
| "type" : "zorder"
| },
| "scheme" : {
| "crs" : "epsg:3857",
| "tileSize" : 256,
| "resolutionThreshold" : 0.1
| },
| "type" : "multiband.spatial.write"
| }
|]
""".stripMargin
val maskJsonStr = maskJson.replace("{input}", input).replace("{output}", output).replace("{layerName}", layerName)
val list: Option[Node[Stream[(Int, MultibandTileLayerRDD[SpatialKey])]]] = maskJsonStr.node
list match {
case None => println("Couldn't parse the JSON")
case Some(node) => {
node.eval.foreach { case (zoom, rdd) =>
println(s"ZOOM: ${zoom}")
println(s"COUNT: ${rdd.count}")
}
}
}
}
/**
* 执行单波段栅格数据切片,编码的形式,步骤包括 读取-》tile to layout->重投影-》建立金字塔-》写入
* 结果同createSinglebandTileJson一样,只不过写法不一样而已。
*
* @param input 数据源
* @param output 存储位置
* @param layerName 图层名
*/
def createSinglebandTileAst(input: String, output: String, layerName: String): Unit = {
//瓦片布局方案, 这个在重投影和写入的时候会用到
val scheme = Left[LayoutScheme, LayoutDefinition](ZoomedLayoutScheme(WebMercator))
//读取待切片数据
val jsonRead = JsonRead(input, `type` = ReadTypes.SpatialHadoopType)
//Tile To Layout
val jsonTileToLayout = TileToLayout(`type` = TransformTypes.SpatialTileToLayoutType)
//重投影,投影转换
val jsonReproject = Reproject("EPSG:3857", scheme, `type` = TransformTypes.SpatialBufferedReprojectType)
//建立金字塔
val jsonPyramid = Pyramid(`type` = TransformTypes.SpatialPyramidType)
//写入
val jsonWrite = JsonWrite(layerName, output, PipelineKeyIndexMethod("zorder"), scheme, `type` = WriteTypes.SpatialType)
//转换成数组
val list: List[PipelineExpr] = jsonRead ~ jsonTileToLayout ~ jsonReproject ~ jsonPyramid ~ jsonWrite
// typed way, as in the JSON example above
val typedAst: Node[Stream[(Int, TileLayerRDD[SpatialKey])]] =
list.node[Stream[(Int, TileLayerRDD[SpatialKey])]]
val result: Stream[(Int, TileLayerRDD[SpatialKey])] = typedAst.eval
// in some cases you may want just to evaluate the pipeline
// to add some flexibility we can do parsing and avaluation steps manually
// erasedNode function would parse JSON into an ErasedNode type that can be evaluated
val untypedAst: ErasedNode = list.erasedNode
// it would be an untyped result, just some evaluation
// but you still have a chance to catch and handle some types of exceptions
val untypedResult: Any =
Try {
untypedAst.unsafeEval
} match {
case Success(_) =>
case Failure(e) =>
}
// typed result
val typedResult: Option[Stream[(Int, TileLayerRDD[SpatialKey])]] =
Try {
untypedAst.eval
} match {
case Success(stream) => Some(stream)
case Failure(e) => None
}
}
/**
* 执行多波段栅格数据切片,编码的形式,步骤包括 读取-》tile to layout->重投影-》建立金字塔-》写入
* 结果同createMultibandTileJson一样,只不过写法不一样而已。
*
* @param input 数据源
* @param output 存储位置
* @param layerName 图层名
*/
def createMultibandTileAst(input: String, output: String, layerName: String): Unit = {
//瓦片布局方案, 这个在重投影和写入的时候会用到
val scheme = Left[LayoutScheme, LayoutDefinition](ZoomedLayoutScheme(WebMercator))
//读取待切片数据
val jsonRead = JsonRead(input, `type` = ReadTypes.MultibandSpatialHadoopType)
//Tile To Layout
val jsonTileToLayout = TileToLayout(`type` = TransformTypes.MultibandSpatialTileToLayoutType)
//重投影,投影转换
val jsonReproject = Reproject("EPSG:3857", scheme, `type` = TransformTypes.MultibandSpatialBufferedReprojectType)
//建立金字塔
val jsonPyramid = Pyramid(`type` = TransformTypes.MultibandSpatialPyramidType)
//写入
val jsonWrite = JsonWrite(layerName, output, PipelineKeyIndexMethod("zorder"), scheme, `type` = WriteTypes.MultibandSpatialType)
//转换成数组
val list: List[PipelineExpr] = jsonRead ~ jsonTileToLayout ~ jsonReproject ~ jsonPyramid ~ jsonWrite
// typed way, as in the JSON example above
val typedAst: Node[Stream[(Int, MultibandTileLayerRDD[SpatialKey])]] =
list
.node[Stream[(Int, MultibandTileLayerRDD[SpatialKey])]]
val result: Stream[(Int, MultibandTileLayerRDD[SpatialKey])] = typedAst.eval
// in some cases you may want just to evaluate the pipeline
// to add some flexibility we can do parsing and avaluation steps manually
// erasedNode function would parse JSON into an ErasedNode type that can be evaluated
val untypedAst: ErasedNode = list.erasedNode
// it would be an untyped result, just some evaluation
// but you still have a chance to catch and handle some types of exceptions
val untypedResult: Any =
Try {
untypedAst.unsafeEval
} match {
case Success(_) =>
case Failure(e) =>
}
// typed result
val typedResult: Option[Stream[(Int, MultibandTileLayerRDD[SpatialKey])]] =
Try {
untypedAst.eval
} match {
case Success(stream) => Some(stream)
case Failure(e) => None
}
}
}
测试代码
测试代码是java编写
package com.tiger.geotrellis.tile;
import com.tiger.geotrellis.common.tile.TileTiff;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
/**
* TileTiff 类中的方法的测试,切片相关功能的测试
*
*
* 数据源: 可以是本地文件也可以是hadoop文件系统,sour可以使某一个tif文件,也可以是一个文件夹
* 存储目标: 可以使本地文件也可以是hadoop文件系统,target只能是文件夹
* 图层名: 可以随意设置,用于标识图层,以后读取数据的时候会用到。
*
*
* #数据源是本地文件,存储目标是hdfs
* var sour = "file:/E:/study/bigdata/geotrellis/testdata/img/singleband.tif";
* var target = "hdfs://node1:8020/data/test/layers/";
*
* #数据源是hdfs中的文件,存储目标是hdfs
* var sour = "hdfs://node1:8020/data/test/sour/singleband.tif";
* var target = "hdfs://node1:8020/data/test/layers/";
*
*
* @author tiger
*/
@SpringBootTest
class TileTiffTest4J {
/**
* 测试 读取本地单波段tif数据切片存储至本地
*/
@Test
void testCreateSinglebandTileJson() {
//数据源和存储都在本地磁盘
String sour = "file:/E:/study/bigdata/geotrellis/testdata/img/singleband.tif";
String target = "file:/E:/study/bigdata/geotrellis/testdata/img/layers/";
//图层名
String layername = "singlebandLayerJson";
TileTiff.createSinglebandTileJson(sour, target, layername);
}
/**
* 测试 读取本地单波段tif数据切片存储至hdfs
*/
@Test
void testCreateSinglebandTileJsonToHdfs() {
//数据源本地磁盘
String sour = "file:/E:/study/bigdata/geotrellis/testdata/img/singleband.tif";
//切片存储到hdfs
String target = "hdfs://node1:8020/data/test/layers/";
//图层名
String layername = "singlebandLayerJson";
TileTiff.createSinglebandTileJson(sour, target, layername);
}
@Test
void testCreateSinglebandTileAst() {
//数据源和存储都在本地磁盘
String sour = "file:/E:/study/bigdata/geotrellis/testdata/img/singleband.tif";
String target = "file:/E:/study/bigdata/geotrellis/testdata/img/layers/";
//图层名
String layername = "singlebandLayerAst";
TileTiff.createSinglebandTileAst(sour, target, layername);
}
/**
* 测试 读取本地多波段tif数据切片存储至本地
*/
@Test
void testCreateMultibandTileJson() {
//数据源和存储都在本地磁盘
String sour = "file:/E:/study/bigdata/geotrellis/testdata/img/multiband.tif";
String target = "file:/E:/study/bigdata/geotrellis/testdata/img/layers/";
//图层名
String layername = "multibandLayerJson";
TileTiff.createMultibandTileJson(sour, target, layername);
}
/**
* 测试 读取本地多波段tif数据切片存储至hdfs
*/
@Test
void testCreateMultibandTileJsonToHdfs() {
//数据源本地磁盘
String sour = "file:/E:/study/bigdata/geotrellis/testdata/img/multiband.tif";
//切片存储至hdfs
String target = "hdfs://node1:8020/data/test/layers/";
//图层名
String layername = "multibandLayerJson";
TileTiff.createMultibandTileJson(sour, target, layername);
}
@Test
void testCreateMultibandTileAst() {
//数据源和存储都在本地磁盘
String sour = "file:/E:/study/bigdata/geotrellis/testdata/img/multiband.tif";
String target = "file:/E:/study/bigdata/geotrellis/testdata/img/layers/";
//图层名
String layername = "multibandLayerAst";
TileTiff.createMultibandTileAst(sour, target, layername);
}
}
执行测试:
注意:如果在执行测试的时候出现一下错误:
就按图中所示点击,在弹出框中选择
然后再执行测试就不会报错了。
5.结果
本地文件夹结果:
具体瓦片数据,注意这有别于传统理解的瓦片,这里存储的是二进制数据。不是图片。
注意:如果是存储到hdfs中,就去hdfs中查看,结构都是差不多的。只要在存储的路径写的是hdfs路径,那结果就会存储到hdfs中。
总结
- 使用管道方式切片的两种方式,以及分写对单波段和多波段数据进行切片。我觉得是可以满足一部分同学的入门需求了。
- 后续会推出怎么使用切片后的瓦片数据。
- 欢迎互相学习,交流讨论,本人的微信:huangchuanxiaa。
- 源代码地址(码云):
SSH: git@gitee.com:tiger_hcxx/spatiotemporal-big-data.git
HTTPS: https://gitee.com/tiger_hcxx/spatiotemporal-big-data.git
仓库说明:
1、仓库为开源仓库。
2、SSH方式需要你本地生成公钥,然后到码云上-设置-SSH公钥中登记下,才可下载代码。具体做法自己百度哦
参考
- 官方文档:https://geotrellis.readthedocs.io/en/v3.5.1/guide/pipeline.html