Spark access s3 exception Bad Request 400

最近使用spark2 访问s3a时报如下错误:

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: FD92FDC175C64AA2, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: IuloUEASgqnY4lrSMpbyJpwgFfCFbttxuxmJ9hGHMUgZTbO/UR/YyDgjix+3rBe0Y4MQHPzNvhA=
    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
    at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
    at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:154)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.RDD.take(RDD.scala:1327)

查找很久没有找到原因,源代码如下:

@Test
public void testS3Access(){
    SparkSession spark = SparkSession
            .builder()
            .appName("TestJob")
            .master("local[*]")
            .config("spark.sql.warehouse.dir", WARE_HOUSE)
            .config("hive.metastore.uris", META_STORE)
            .config("spark.hadoop.fs.s3a.access.key", "your access key")
            .config("spark.hadoop.fs.s3a.secret.key", "your secret key")
            .enableHiveSupport()
            .getOrCreate();

    spark.sql("SELECT * FROM table_name limit 1").show();
    spark.close();
}

dependencies: 

compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.2.0'
compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.2.0'

查找很久没有找到原因,最后使用属性“spark.hadoop.fs.s3a.endpoint” 设置了才正确访问。

.config("spark.hadoop.fs.s3a.access.key", "your access key")
.config("spark.hadoop.fs.s3a.secret.key", "your secret key")
.config("spark.hadoop.fs.s3a.endpoint", "s3-ap-northeast-2.amazonaws.com")

要确定自己的endpoint, 请查询参考资料1.

所以要想正确访问S3,需要设置好这三个属性。

 

参考:

确定s3属于哪个region

https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region 

hadoop-aws集成

https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html

S3A ON SPARK ON AWS EC2

http://deploymentzone.com/2015/12/20/s3a-on-spark-on-aws-ec2/

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值