@创建于:2022.06.16
@修改于:2022.06.16
1 问题描述
基于本地pc机器安装的Spark2.4.2,采用本地模式,读取远程服务器的hive数据。
报出如下错误:
pyspark.sql.utils.IllegalArgumentException: ‘java.net.UnknownHostException: cdhmaster’
Traceback (most recent call last):
File "D:/programs/Anaconda_program/tsp_spark/main.py", line 32, in <module>
get_data3()
File "D:\programs\Anaconda_program\tsp_spark\src\data_preprocess.py", line 85, in get_data3
df.show()
File "C:\ProgramData\Anaconda3\envs\spark242\lib\site-packages\pyspark\sql\dataframe.py", line 378, in show
print(self._jdf.showString(n, 20, vertical))
File "C:\ProgramData\Anaconda3\envs\spark242\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "C:\ProgramData\Anaconda3\envs\spark242\lib\site-packages\pyspark\sql\utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: 'java.net.UnknownHostException: cdhmaster'
2 代码
import os
# 设置你自己的路径
java8_location = r'C:\Program Files\Java\jdk1.8.0_211'
os.environ['JAVA_HOME'] = java8_location
def get_data3():
print('Run in get_data3()')
# 新建SparkSession
spark_session = SparkSession.builder.master("local[*]").appName("test") \
.config("hive.metastore.uris", "thrift://IP:端口号") \
.config('spark.executor.memory', '4g') \
.config('spark.driver.memory', '4g') \
.enableHiveSupport().getOrCreate()
print('spark session...')
df = spark_session.sql("select * from db.table")
df.show()
data = [(10, 'Aa'), (11, 'Bb')]
df_new = spark_session.createDataFrame(data, ['id', 'name'])
df_new.show()
df_new.registerTempTable('db')
df_new.write.format("hive").mode("append").saveAsTable('db.table')
spark_session.stop()
3 解决办法
修改Win10系统下的host文件。
C:\WINDOWS\system32\drivers\etc\host
添加下面的内容
192.168.X.X cdhmaster
参考资料
java.net.UnknownHostException 解决方法
java.net.UnknownHostException: unknown host: master
hosts文件是什么? 以及在各个系统中(Windows、Mac、Linux)的hosts文件在哪里?