I was able to do some digging around in the latest Spark documentation, and I notice they have a new configuration setting that I hadn't noticed before:
spark.sql.warehouse.dir
So I went ahead and added this setting when I set up my SparkSession:
spark = SparkSession.builder \
.master('local[*]') \
.appName('My App') \
.config('spark.sql.warehouse.dir', 'file:///C:/path/to/my/') \
.getOrCreate()
That seems to set the working directory, and then I can just feed my filename directly into the csv reader:
df = spark.read \
.format('csv') \
.option('header', 'true') \
.load('file.csv', schema=mySchema)
Once I set the spark warehouse, Spark was able to locate all of my files and my app finishes successfully now. The amazing thing is that it runs about 20 times faster than it did in Spark 1.6. So they really have done some very impressive work optimizing their SQL engine. Spark it up!
如果这篇文章无法解决你的问题,请看下面这篇转载的文章。