spark踩过的bug

虽然是很基础的东西,希望对后来者快速定位问题

1、java.net.ConnectException: Call From hadoop/xxx.xxx.xxx.xxx to hadoop:8020 failed on connection exception: java.net.ConnectException: 拒绝连接;

原因本人写了一个rdd

val  rdd =sc.textFile("hdfs://hadoop/spark/person.txt").map(_.split(","))
没有指定访问分布式文件系统的端口号(即配置文件里面设定的文件系统端口,自己配置的端口为9000),执行程序会根据默认的8020端口去访问文件系统,而访问不到出现上述错误。

解决办法

val  rdd =sc.textFile("hdfs://hadoop:9000/spark/person.txt").map(_.split(","))

2、error: not found: value sqlContext

Spark 2.0 doesn't use SQLContext anymore:

  • use SparkSession (initialized in spark-shell as spark).
  • for legacy application you can:

    val sqlContext = spark.sqlContext

3.spark sql 和hive sql

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

./hive --service metastore 

4.spark跑了一个程序,错误栈日志

Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:330)
at cn.itcast.spark.WC$.main(WC.scala:16)
at cn.itcast.spark.WC.main(WC.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
pom.xml几个重要的包得版本如下:

<properties>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.10.6</scala.version>
    <spark.version>1.6.1</spark.version>
    <hadoop.version>2.5.2</hadoop.version>
</properties>

做了几个测试,其他版本不变,hadoop.version 为 2.4.0,2.4.1,2.5.2,2.6.1,2.6.4 都会报上面的错误。

解决办法:
把hadoop version改成2.2.0就可以了,测了一下2.7.2也可以,所以估计是hadoop的一个坑,2.7.2就修复了。


5、遇见一个hadoop bug,等待复盘

原因是 

本机C:\Windows\System32\drivers\etc 配置的ip与主机名映射

程序 conf.set("fs.defaultFS", "hdfs://ip:9000"); 

虚拟机 /etc/hosts中  ip与主机名映射

三者不对应,一直报9000端口错误,9000端口为RPC调用端口

6、spark2.11  kafka 0.10连接

连接中一直报bug

18/03/07 20:33:03 WARN ConsumerFetcherManager$LeaderFinderThread: [g1_1U2V4GOF31R91CG-1520425979353-8e8a12f6-leader-finder-thread], Failed to find leader for Set([test,0])

kafka.common.KafkaException: fetching topic metadata for topics [Set(test)] from broker [ArrayBuffer(id:0,host:localhost,port:9092)] failed


kafka java.nio.channels.ClosedChannelException

本人解决方法:

1、server.properties

listeners=PLAINTEXT://hadoop:9092

zookeeper.connect=hadoop:2181(默认是localhost,本人主机名是 hadoop)
2、producer.properties
bootstrap.servers=hadoop:9092(默认是localhost,本人主机名是 hadoop)

当然前提是在C:\Windows\System32\drivers\etc 和 /etc/hosts 进行主机名和ip地址的映射

与此同时,生产者、消费者也要将localhost修改为主机名hadoop

生产者

./kafka-console-producer.sh --broker-list hadoop:9092 --topic test

消费者
./kafka-console-consumer.sh --zookeeper hadoop:2181 --topic test --from-beginning

如果使用idea进行传递参数,可以用主机名也可以使用IP,如下

hadoop:2181 g1 test 2
val Array(zkQuorum, group, topics, numThreads) = args





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值