java.net.URISyntaxException: Relative path in absolute URI

最新推荐文章于 2024-03-14 12:44:27 发布

伙伴几时见

最新推荐文章于 2024-03-14 12:44:27 发布

阅读量3.9k

点赞数 3

分类专栏： python数据挖掘 spark python

本文链接：https://blog.csdn.net/huobanjishijian/article/details/52583544

版权

python数据挖掘同时被 2 个专栏收录

74 篇文章 1 订阅

订阅专栏

spark python

15 篇文章 0 订阅

订阅专栏

I was able to do some digging around in the latest Spark documentation, and I notice they have a new configuration setting that I hadn't noticed before:

spark.sql.warehouse.dir

So I went ahead and added this setting when I set up my SparkSession:

spark = SparkSession.builder \
           .master('local[*]') \
           .appName('My App') \
           .config('spark.sql.warehouse.dir', 'file:///C:/path/to/my/') \
           .getOrCreate()

That seems to set the working directory, and then I can just feed my filename directly into the csv reader:

df = spark.read \
        .format('csv') \
        .option('header', 'true') \
        .load('file.csv', schema=mySchema)

Once I set the spark warehouse, Spark was able to locate all of my files and my app finishes successfully now. The amazing thing is that it runs about 20 times faster than it did in Spark 1.6. So they really have done some very impressive work optimizing their SQL engine. Spark it up!

如果这篇文章无法解决你的问题，请看下面这篇转载的文章。

执行示例代码的时候

遇到一个错误：

Relati ve path in absolute URI

意思是相对路径出现在了绝对的统一资源定位符中

根据下面的参考：

http://stackoverflow.com/questions/38669206/spark-2-0-relative-path-in-absolute-uri-spark-warehouse

在构建SparkSession的时候，多传递一个一个路径参数的设置 spark.sql.warehouse.dir

因为

pyspark.sql.utils.IllegalArgumentException: 'java.net.URISyntaxException: Relati

ve path in absolute URI: file:D:/software/spark-2.0.0-bin-hadoop2.7/examples/src

/main/python/ml/spark-warehouse'

实际是读取当前路径下的 spark.sql.warehouse.dir

这个设置应该是直接把这个做成了绝对路径

然后还需要把整个的data文件夹拷贝到当前的ml文件夹下

这样示例程序中原始的相对路径不用再修改了

因为我发现用../并不能从当前执行路径跳转到设置的data路径

伙伴几时见

关注

3
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java.net.URISyntaxException: Relative path in absolute URI

I was able to do some digging around in the latest Spark documentation, and I notice they have a new configuration setting that I hadn't noticed before:spark.sql.warehouse.dir
复制链接

扫一扫