Spark2.0 yarn方式启动报错

背景:

升级spark

由1.5版本的spark升级至2.0版本,将编译好的2.0版本spark软件包放到指定目录下,解压替换原先1.5版本的spark

$   spark-sql --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
      /_/
                        
Branch
Compiled by user jenkins on 2016-07-19T21:16:08Z
Revision
Url
Type --help for more information.

​替换目录注意事项:

1、配置文件

需要替换的配置文件:

slaves  ---所有worker的节点

spark-defaults.conf  --spark启动配置文件

spark-env.sh --spark启动环境变量

log4j.properties --spark启动日志情况

hive-site.xml --hive的配置文件,此处可以直接从hive配置里拷贝一份或者直接建立软链接

$ ln -s $HIVE_HOME/conf/hive-site.xml hive-site.xml

2、软件包

重点注意:由于hive使用的元数据库为mysql,此处spark与hive共用mysql,所以需要拷贝mysql连接的jar包  mysql-connector-java-5.1.31-bin.jar

另外,1.5版本里的spark,jar包是存放在$SPARK_HOME/lib目录下的,而2.0的jar包是存放于 $SPARK_HOME/jars下的。

两种模式启动情况

1、本地模式:

$ spark-sql --master local               

可以正常启动!

2、yarn模式

$ spark-sql --master yarn

 

1、报错信息


16/08/25 17:29:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/08/25 17:29:28 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
        at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:57)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:288)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:137)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 25 more

启动报错,提示找不到 jersey 的类

查看关于jersey的jar包

$ find /opt/beh/core/ |grep jersey

发现yarn的lib包下面使用的是1.9的jar,而spark下使用的是2.22.2的jar包,

/opt/beh/core/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar
/opt/beh/core/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar

/opt/beh/core/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar
/opt/beh/core/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar
/opt/beh/core/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar

/opt/beh/core/spark/jars/jersey-container-servlet-2.22.2.jar
/opt/beh/core/spark/jars/jersey-guava-2.22.2.jar
/opt/beh/core/spark/jars/jersey-container-servlet-core-2.22.2.jar
/opt/beh/core/spark/jars/jersey-common-2.22.2.jar
/opt/beh/core/spark/jars/jersey-server-2.22.2.jar
/opt/beh/core/spark/jars/jersey-media-jaxb-2.22.2.jar
/opt/beh/core/spark/jars/jersey-client-2.22.2.jar

 

经过多次测试验证,发现所缺的类在 jersey-core-1.9.jar 和 jersey-client-1.9.jar 两个jar包中

缺少 jersey-core-1.9.jar 的报错信息如下:

$ spark-sql --master yarn                       
16/08/25 17:42:47 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/08/25 17:42:47 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:57)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:288)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:137)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.core.util.FeaturesAndProperties
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 37 more

2、解决办法

将 jersey-core-1.9.jar 和 jersey-client-1.9.jar 这两个包拷贝到$SPARK_HOME/jars目录下,并将该目录下原本的 jersey-client-2.22.2.jar改名,yarn模式的spark就可以正常启动。

$ spark-sql --master yarn --num-executors 36
16/08/25 16:40:15 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/08/25 16:40:15 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/08/25 16:40:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/25 16:40:38 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
16/08/25 16:40:38 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@493be56e:an attempt to override final parameter: hive.security.authorization.enabled;  Ignoring.
16/08/25 16:40:38 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@493be56e:an attempt to override final parameter: hive.security.authorization.task.factory;  Ignoring.
16/08/25 16:40:39 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@520f7f74:an attempt to override final parameter: hive.security.authorization.enabled;  Ignoring.
16/08/25 16:40:39 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@520f7f74:an attempt to override final parameter: hive.security.authorization.task.factory;  Ignoring.
​

spark-sql (default)> cache table dm_cljy_m_all_label_txt_ext;
16/08/25 16:40:45 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
[Stage 0:====>                                                 (13 + 135) / 148]16/08/25 16:42:17 WARN DFSClient: Slow ReadProcessor read fields took 91294ms (threshold=30000ms); ack: seqno: 74 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 19355905, targets: [10.162.5.167:50010, 10.162.5.164:50010, 10.162.5.162:50010]
[Stage 0:====================================================>  (141 + 7) / 148]16/08/25 16:43:14 WARN DFSClient: Slow ReadProcessor read fields took 48494ms (threshold=30000ms); ack: seqno: 132 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 821343, targets: [10.162.5.167:50010, 10.162.5.164:50010, 10.162.5.162:50010]
Time taken: 155.439 seconds                                                     
spark-sql (default)>    select count(*) from dm_cljy_m_all_label_txt_ext;
16/08/25 16:43:59 WARN DFSClient: Slow ReadProcessor read fields took 41094ms (threshold=30000ms); ack: seqno: 149 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 418964, targets: [10.162.5.167:50010, 10.162.5.164:50010, 10.162.5.162:50010]
54966176
Time taken: 1.104 seconds, Fetched 1 row(s)


​

cache一张表,再对该表进行计算,spark真是快啊。

 

转载于:https://my.oschina.net/xiaozhublog/blog/737902

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值