Geotrellis学习笔记(四)——ETL工具

看官方文档的时候看到geotrellis工具有一个看上去很强大的ETL工具,但是看了好几次都看不明白他到底是干什么的,理解和跑通ETL是一个超级曲折艰难的过程。这篇文章就来记录一下ETL的曲折过程~

首先ETL是个啥??

参考资料:https://docs.microsoft.com/en-us/azure/architecture/data-guide/relational-data/etl

Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store. The transformation work in ETL takes place in a specialized engine, and often involves using staging tables to temporarily hold data as it is being transformed and ultimately loaded to its destination.

The data transformation that takes place usually involves various operations, such as filtering, sorting, aggregating, joining data, cleaning data, deduplicating, and validating data.

ETL工具就是一个从数据仓库里抽取数据、转换数据和加载数据到目的端的过程。

这样说感觉还是很抽象,没有一个很具象的概念,那就先直接上手做吧~

首先我在geotrellis的geotrellis-chatta-demo里面把data全都拷贝到我的工程里,作为测试数据。

import geotrellis.raster.Tile
import geotrellis.spark.SpatialKey
import geotrellis.spark.etl.{Etl}
import geotrellis.spark.util.SparkUtils
import geotrellis.vector.ProjectedExtent
import org.apache.spark.{SparkConf}

object GeoTrellisETL {
  def main(args: Array[String]): Unit = {
    var args=new Array[String](6)
    args(0)="--input"
    args(1)="D:\\IdeaProjects\\GeotrellisETL\\data\\input.json"
    args(2)="--output"
    args(3)="D:\\IdeaProjects\\GeotrellisETL\\data\\output.json"
    args(4)="--backend-profiles"
    args(5)="D:\\IdeaProjects\\GeotrellisETL\\data\\backend-profiles.json"
    goETL(args)
}
  def goETL(args: Array[String]): Unit = {
    implicit val sc = SparkUtils.createSparkContext("GeoTrellis ETL SinglebandIngest", new SparkConf(true).setMaster("local"))
    try {
      Etl.ingest[ProjectedExtent, SpatialKey, Tile](args)
      println("finished!")
    } finally {
      sc.stop()
    }
  }
}

三个文件分别是:

input.json:

[
  {
    "name": "dl",
    "format": "geotiff",
    "backend": {
      "type": "hadoop",
      "path": "D:\\IdeaProjects\\GeotrellisETL\\data\\arg_wm\\hangzhou1984.tiff"
    },
    "cache": "NONE"
  }
]

output.json:

{
  "backend": {
    "type": "file",
    "path": "D:\\IdeaProjects\\GeotrellisETL\\data\\HZ4326_output4326"
  },
  "reprojectMethod": "buffered",
  "cellSize": {
    "width": 256.0,
    "height": 256.0
  },
  "encoding":"png",
  "tileSize": 256,
  "pyramid": true,
  "resampleMethod": "nearest-neighbor",
  "keyIndexMethod": {
    "type": "zorder"
  },
  "layoutScheme": "zoomed",
  "crs": "EPSG:4326"
}

backend-profiles.json:

{
  "backend-profiles": [
    {
      "name": "accumulo-local",
      "type": "accumulo",
      "zookeepers": "localhost",
      "instance": "gis",
      "user": "root",
      "password": "secret"
    },
    {
      "name": "cassandra-local",
      "type": "cassandra",
      "allowRemoteDCsForLocalConsistencyLevel": false,
      "localDc": "datacenter1",
      "usedHostsPerRemoteDc": 0,
      "hosts": "localhost",
      "replicationStrategy": "SimpleStrategy",
      "replicationFactor": 1,
      "user": "",
      "password": ""
    }
  ]
}

运行一下,发现报错:

报错1

后来发现是jar包的版本问题:

报错

解决方法:把sbt里的包全部换成1.0.0就好了。

然后运行代码,可以看到在output的路径里生成了不同层级的文件夹,每个文件夹下面有若干个没有后缀名的文件。一开始一直以为自己搞错了,因为这些文件根本看不了,也不能转成jpg。后来找了各种资料之后才知道这里生成出来的是layer文件,如果要看的话,要把layer进行渲染,才能保存成jpg或者png等格式。

于是我对layer进行了渲染:

import java.io.File

import geotrellis.proj4.{LatLng, WebMercator}
import geotrellis.raster.Tile
import geotrellis.raster.render.{ColorMap, ColorRamp, RGB}
import geotrellis.spark._
import geotrellis.spark.io._
import geotrellis.spark.io.file._
import geotrellis.spark.util.SparkUtils
import geotrellis.vector.Extent
import org.apache.spark.SparkConf


object ETLOutputRender{
  val colorMap1 =
    ColorMap(
      Map(
        0 -> RGB(0,0,0),
        1 -> RGB(255,255,255)
      )
    )
  val colorRamp =
    ColorRamp(RGB(0,0,0), RGB(255,255,255))
      .stops(100)
      .setAlphaGradient(0xFF, 0xAA)
  val zoomID=6

  def main(args: Array[String]): Unit = {
    println("hello geotrellis")
    /* Some location on your computer */
    implicit val sc = SparkUtils.createSparkContext("GeoTrellis ETL SinglebandIngest", new SparkConf(true).setMaster("local"))
    val catalogPath: String ="D:\\IdeaProjects\\GeotrellisETL\\data\\DL3857_output3857"
    val store: AttributeStore = FileAttributeStore(catalogPath)
    val reader = FileLayerReader(store,catalogPath)

//    for( zoom <- 0 to zoomID){
      val layerId = LayerId("dl",zoomID)
      val broadcast_zoom =sc.broadcast(zoomID)
      val sameLayer: TileLayerRDD[SpatialKey] = reader.read(layerId)
      //TODO  change to mapPartitions
      val result =sameLayer.map(myfuncPerElement).collect
//    }
  }

  def myfuncPerElement(e:(SpatialKey,Tile)):Int = {
    val zoom_dir:File= new File("D:\\IdeaProjects\\GeotrellisETL\\data\\DL3857_output3857\\jpgs_benshen\\"+zoomID)
    if(!zoom_dir.exists()){
      zoom_dir.mkdir()
    }
    val col_dir = new File(zoom_dir+"\\"+e._1.col)
    if(!col_dir.exists()){
      col_dir.mkdir()
    }
//    e._2.renderJpg(colorRamp).write(col_dir+"\\"+(math.pow(2,zoomID)-e._1.row-1).toInt+".jpg")
    e._2.renderJpg(colorRamp).write(col_dir+"\\"+e._1.row+".jpg")
    return 1
  }
}

最后就可以在输出路径里看到渲染之后的图片啦。

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
回答: ETL工具是指用于数据抽取(Extract)、转换(Transform)和加载(Load)的软件工具。在Java中,有多种ETL工具可供选择,其中一种是Kettle。Kettle是一个开源的ETL工具,它提供了一套强大的数据集成、数据转换和数据加载功能,可以通过编写Java代码来使用Kettle的API进行ETL操作。在使用Kettle时,可以通过指定jdbcUrl来连接数据库,并使用execRepositoryJobs方法执行Kettle的作业。\[1\]\[2\]此外,除了Kettle,还有其他的ETL工具可供选择,比如DataX。DataX也是一个开源的ETL工具,它的实现思路是通过定时任务从一个数据源读取数据并同步到另一个数据源中。感兴趣的开发者可以根据自己的需求选择适合的ETL工具进行数据处理。\[3\] #### 引用[.reference_title] - *1* *3* [一文带你入门ETL工具-datax的简单使用](https://blog.csdn.net/weixin_29634081/article/details/114079748)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [Etl工具之Kettle與java集成二](https://blog.csdn.net/weixin_34746495/article/details/115660842)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值