本地idea测试时,遇到java.net.UnknownHostException报错
方案1:在resources目录下添加core-site.xml和hdfs-site.xml文件,(失败)
方案2:在代码中直接引入(成功)
context = session.sparkContext
context.hadoopConfiguration.set("fs.defaultFS", "hdfs://my_cluster")
context.hadoopConfiguration.set("dfs.nameservices", "my_cluster")
context.hadoopConfiguration.set("dfs.client.failover.proxy.provider.my_cluster", "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
context.hadoopConfiguration.set("dfs.ha.namenodes.my_cluster", "namenode71,namenode164")
context.hadoopConfiguration.set("dfs.namenode.rpc-address.my_cluster.namenode71", "node1:8020")
context.hadoopConfiguration.set("dfs.namenode.rpc-address.my_cluster.namenode164", "node2:8020")
后来把这段配置注释掉,又尝试了方案1,发现也成功了(有毒)
遇到本地文件路径识别为hdfs路径报错:hdfs://my_cluster/D:/Files
方案1:增加路径开头file:\\(成功)
clientLog = context.textFile("D:\\Files\\IDEA-study\\music-project-study\\data\\currentday_clientlog.tar.gz")