spark 集群安装

下载安装文件
配置Spark-env.sh 和spark-default.properties

sbin/start-master.sh

slaver节点启动 worker

./start-slave.sh spark://cnsz046690:7077


scp -r spark-2.1.1-bin-hadoop2.6 cnsz046691:~
scp -r spark-2.1.1-bin-hadoop2.6 cnsz046745:~
scp -r spark-2.1.1-bin-hadoop2.6 cnsz046746:~


spark 2.1.1 参数修改

import scala.collection.JavaConversions._

show partitions base.UDS_B_I_TRADE_FUND_MOVT;

show partitions base.UDS_B_I_TRADE_FUND_MOVT;
select count(1) from base.UDS_B_I_TRADE_FUND_MOVT;

spark-sql –files /etc/spark/log4j.properties

spark-sql -Dlog4j.configuration=/etc/spark/log4j.properties

Spark 环境部署和动态资源分配配置

  • spark2.2及以后,Java要求最低要Java8
  • spark-sql 不支持custer模式
mv spark-2.2.0-bin-hadoop2.6 /usr/lib
ln -s spark-2.2.0-bin-hadoop2.6 spark

mv /opt/app/spark/conf/* .
ln -s /etc/spark/conf conf
ln -s /var/log/spark logs

启动history server

/usr/lib/spark/sbin/start-history-server.sh

如果使用yarn模式,好像不用修改
修改文件log4j.properties,将日志级别调整为WARN
log4j.rootCategory=INFO, console

添加全局路径
export PATH=$PATH:/usr/lib/spark/bin

Spark log配置

log4j.rootCategory=INFO, RFA
log4j.appender.RFA=org.apache.log4j.RollingFileAppender
log4j.appender.RFA.File=/appcom/log/spark/spark-${user.name}.log

log4j.appender.RFA.MaxFileSize=256MB
log4j.appender.RFA.MaxBackupIndex=20

log4j.appender.RFA.layout=org.apache.log4j.PatternLayout

# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

问题排查

Spark2.0 yarn方式启动报错

优雅的解决方法

Jersey problem

If you try to run a spark-submit command on YARN you can expect the following error message:

Exception in thread “main” java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
Jar file jersey-bundle-*.jar is not present in the $SPARK_HOME/jars. Adding it fixes this problem:

sudo -u spark wget http://repo1.maven.org/maven2/com/sun/jersey/jersey-bundle/1.19.1/jersey-bundle-1.19.1.jar -P $SPARK_HOME/jars
January 2017 – Update on this issue:
If the following is done, Jersey 1 will be used when starting Spark History Server and the applications in Spark History Server will not be shown. The folowing error message will be generated in the Spark History Server output file:

WARN servlet.ServletHandler: /api/v1/applications
java.lang.NullPointerException
        at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)

This problem occurs only when one tries to run Spark on YARN, since YARN 2.7.3 uses Jersey 1 and Spark 2.0 uses Jersey 2

One workaround is not to add the Jersey 1 jar described above but disable the YARN Timeline Service in spark-defaults.conf

spark.hadoop.yarn.timeline-service.enabled false
[解决方法二](https://my.oschina.net/xiaozhublog/blog/737902) jar包冲突导致:Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig 解决办法:
cp /usr/lib/hadoop-yarn/lib/jersey-client-1.9.jar /usr/lib/spark/jars
cp /usr/lib/hadoop-yarn/lib/jersey-core-1.9.jar /usr/lib/spark/jars
mv /usr/lib/spark/jars/jersey-client-2.22.2.jar /usr/lib/spark/jars/jersey-client-2.22.2.jar.bak

Spark 获取hive元数据失败

Caused by: MetaException(message:Version information not found in metastore. )
    at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6664)
    at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)

解决方法:
关闭 hive.metastore.schema.verification 参数即可,这个参数会根据hive的版本去检查元数据。

环境测试:

spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 2g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue default \
    /usr/lib/spark/examples/jars/spark-examples_*.jar \
    10

spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 2g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue default \
    /usr/lib/spark/examples/jars/spark-examples_*.jar \
    10

spark-sql --master yarn --deploy-mode client \
  --driver-memory 2g \
  --executor-memory 2g \
  --num-executors 8 

Spark 动态资源配置

  • yarn-site.xml 修改
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle,spark_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
    <value>org.apache.spark.network.yarn.YarnShuffleService</value>
  </property>
  <property>
    <name>spark.shuffle.service.port</name>
    <value>7337</value>
  </property>
  • 发布spark shuffle jar
chmod a+x /usr/lib/spark/lib/*.jar
cp /usr/lib/spark/lib/spark-1.6.3-yarn-shuffle.jar /usr/lib/hadoop-yarn/  并同步到所有NM节点
  • 配置spark-defaults.conf 开启动态资源分配
spark.shuffle.service.enabled true   
spark.shuffle.service.port 7337 
spark.dynamicAllocation.enabled true  
spark.dynamicAllocation.minExecutors 1  
spark.dynamicAllocation.maxExecutors 100  
spark.dynamicAllocation.schedulerBacklogTimeout 1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值