解决spark-redshift只能写不能读的问题

spark-redshift 是由 databricks 公司开发的读写redshift 工具包
在AWS 中国区总出现问题,比如读redshift 报错如下

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 6C553788BDAF505B), S3 Extended Request ID: r98olrdUZ7E1LB8tQfW3iNwRnmYRuVQmSWVpKj9pETTduF88v7AdFns9B7vGFuNxq4V3ip6w720=

结合工作实践,就该spark-redshift 修改相关源码,解决了以上问题。
代码修改 基于spark-redshift .v3.0.0-preview1,
在此和同行分享

1、修改 Utils.scala文件将new AmazonS3URI(addEndpointToUrl(url))注释 改为 new AmazonS3URI(url)

  /**
   * Factory method to create new S3URI in order to handle various library incompatibilities with
   * older AWS Java Libraries
   */
  def createS3URI(url: String): AmazonS3URI = {
    try {
      // try to instantiate AmazonS3URI with url

//      new AmazonS3URI(addEndpointToUrl(url))
      new AmazonS3URI(url)
    } catch {
      case e: IllegalArgumentException if e.getMessage.
        startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint") => {
        new AmazonS3URI(addEndpointToUrl(url))
      }
    }
  }

  /**
   * Since older AWS Java Libraries do not handle S3 urls that have just the bucket name

2、修改RedshiftRelation.scala
增加两行代码
val cn_region = Region.getRegion(Regions.CN_NORTH_1)
s3Client.setRegion(cn_region)

    val filesToRead: Seq[String] = {
        val cleanedTempDirUri =
          Utils.fixS3Url(Utils.removeCredentialsFromURI(URI.create(tempDir)).toString)
        val s3URI = Utils.createS3URI(cleanedTempDirUri)
        val s3Client = s3ClientFactory(creds)
        val cn_region = Region.getRegion(Regions.CN_NORTH_1)
        s3Client.setRegion(cn_region)
//        println("s3Client.setRegion(cn_region),s3Client.getRegion.toString:" + s3Client.getRegion.toString)


        val is = s3Client.getObject(s3URI.getBucket, s3URI.getKey + "manifest").getObjectContent
        val s3Files = try {
          val entries = Json.parse(new InputStreamReader(is)).asObject().get("entries").asArray()
          entries.iterator().asScala.map(_.asObject().get("url").asString()).toSeq
        } finally {
          is.close()
        }

重新用sbt 打包,问题解决

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值