编译spark使cdh支持sparksql

12 篇文章 0 订阅

我的大数据集群环境为cdh5.11.1,在该环境中spark的版本为spark-1.6.0。CDH从5.5开始Spark distro不带Thrift Server分布式SQL引擎、以及spark-sql脚本。Thrift Server是Spark异构数据大融合愿景重要入口之一,spark-sql脚本是测试SQL利器,但CDH优先推自家impala,  SparkSQL虽然不是Spark的主要功能,但SparkSQL是通向Hive和RDB的大门,而且Spark的SQL解析器增加支持一些SQL语法比如注册临时表,这个表可以存在于任何关系数据存储系统(RDB、Hive),只要有驱动就可以,不必编程,还是挺强大的。如果必须要使用SparkSql只能自行编译Spark替换CDH中的jar.

1.Spark源码编译。

下载spark-1.6.0.tgz,地址:https://archive.apache.org/dist/spark/spark-1.6.0/

传输至具有jdk1.8 maven3.3.9 scala2.10.3的环境中,解压准备编译。

进入spark的解压路径,执行

mvn  -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.11.1 -Phive -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -DskipTests -Dmaven.test.skip=true -e clean package

过程很慢,耐心等待。

。。。

经过许久 编译成功

进入spark解压路径/spark-1.6.0/assembly/target/scala-2.10路径:

此jar包就是使cdh支持spark支持Sparksql的需要的jar。

这里是博主编译好的jar的下载地址,如需请自行下载,https://download.csdn.net/download/m0_37618809/10891047

2.cdh集群操作

2.1将编译好的spark-assembly-1.6.0-hadoop2.6.0-cdh5.11.1.jar上传到需要修改的spark主节点的/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/ 路径

不用将原来的spark-assenbly备份删除,这里名称不一样,因此无需备份删除。

2.2替换cdh中的assembly jar

其实就是修改cdh中spark路径下的软链接使其指向新上传jar包所在的路径:

进入/opt/cloudera/parcels/CDH/lib/spark/lib路径:

spark-assembly.jar -> spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar-> ../../../jars/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

 

发现spark-assembly.jar包的软链接指向spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

而spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar指向../../../jars/spark-examples-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

 

修改spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar的软链接使其指向新上传的spark assembly jar:

将原有的spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar备份删除或者重命名

 mv spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar_1

spark-assembly.jar会报错因为找不到spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

新加spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar软链接指向spark-assembly-1.6.0-hadoop2.6.0-cdh5.11.1.jar

ln -s ../../../jars/spark-assembly-1.6.0-hadoop2.6.0-cdh5.11.1.jar spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

spark-assembly.jar不报错。

2.3上传spark-sql执行文件

将spark-sql文件上传至/opt/cloudera/parcels/CDH/lib/spark/bin/

并授权可执行: chmod u+x spark-sql

2.4修改环境变量

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop

export HADOOP_CONF_DIR=/etc/hadoop/conf

export HADOOP_CMD=/opt/cloudera/parcels/CDH/bin/hadoop

export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

export SCALA_HOME=/opt/lisery/scala-2.10.3

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SCALA_HOME/bin

2.5上传assebly jar至hdfs

su hdfs

hdfs dfs -mkdir -p /user/spark/share/lib

使此目录授予授予用户可访问的权限

hdfs dfs -chmod 777 /user

hdfs dfs -put /opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar  /user/spark/share/lib/

2.6vi /etc/spark/conf/classpath.txt

最末尾添加

/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/spark-lineage_2.10-1.6.0-cdh5.11.1.jar

 

 

2.7cdh web页面中修改

 

 

 

1.spark_jar_hdfs_path 项:/user/spark/share/lib/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.jar

 

2.

 

spark.yarn.jar=hdfs://192.168.50.240:8020/user/spark/share/lib/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

export HIVE_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hive/conf

 

保存修改。

 

点击部署客户端:

完成。

 

在环境中任意一个位置执行spark-sql:

[root@KSJ001 ~]# spark-sql 
WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark).
WARNING: Running spark-class from user-defined location.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/spark-assembly-1.6.0-hadoop2.6.0-cdh5.11.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/12/29 02:02:58 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/29 02:02:58 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/29 02:02:58 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/29 02:02:58 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/29 02:03:00 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/29 02:03:01 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:01 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:02 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:02 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/29 02:03:02 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/29 02:03:02 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/12/29 02:03:02 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/29 02:03:03 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/29 02:03:03 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/29 02:03:03 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/29 02:03:03 INFO metastore.HiveMetaStore: 0: get_all_databases
18/12/29 02:03:03 INFO HiveMetaStore.audit: ugi=root	ip=unknown-ip-addr	cmd=get_all_databases	
18/12/29 02:03:03 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/12/29 02:03:03 INFO HiveMetaStore.audit: ugi=root	ip=unknown-ip-addr	cmd=get_functions: db=default pat=*	
18/12/29 02:03:03 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:04 INFO session.SessionState: Created local directory: /tmp/ce11f047-2b62-4a1c-bda9-06b2a8b51c8b_resources
18/12/29 02:03:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/ce11f047-2b62-4a1c-bda9-06b2a8b51c8b
18/12/29 02:03:04 INFO session.SessionState: Created local directory: /tmp/root/ce11f047-2b62-4a1c-bda9-06b2a8b51c8b
18/12/29 02:03:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/ce11f047-2b62-4a1c-bda9-06b2a8b51c8b/_tmp_space.db
18/12/29 02:03:04 INFO spark.SparkContext: Running Spark version 1.6.0
18/12/29 02:03:04 INFO spark.SecurityManager: Changing view acls to: root
18/12/29 02:03:04 INFO spark.SecurityManager: Changing modify acls to: root
18/12/29 02:03:04 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/12/29 02:03:04 INFO util.Utils: Successfully started service 'sparkDriver' on port 37925.
18/12/29 02:03:04 INFO slf4j.Slf4jLogger: Slf4jLogger started
18/12/29 02:03:04 INFO Remoting: Starting remoting
18/12/29 02:03:04 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.50.240:34887]
18/12/29 02:03:04 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 34887.
18/12/29 02:03:04 INFO spark.SparkEnv: Registering MapOutputTracker
18/12/29 02:03:04 INFO spark.SparkEnv: Registering BlockManagerMaster
18/12/29 02:03:04 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-4a7f18b5-ce05-4e51-bad0-fb833e3950ac
18/12/29 02:03:04 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB
18/12/29 02:03:05 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/12/29 02:03:05 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/12/29 02:03:05 INFO ui.SparkUI: Started SparkUI at http://192.168.50.240:4040
18/12/29 02:03:05 INFO client.RMProxy: Connecting to ResourceManager at KSJ001/192.168.50.240:8032
18/12/29 02:03:05 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
18/12/29 02:03:05 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2564 MB per container)
18/12/29 02:03:05 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/12/29 02:03:05 INFO yarn.Client: Setting up container launch context for our AM
18/12/29 02:03:05 INFO yarn.Client: Setting up the launch environment for our AM container
18/12/29 02:03:05 INFO yarn.Client: Preparing resources for our AM container
18/12/29 02:03:05 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://192.168.50.240:8020/user/spark/share/lib/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar
18/12/29 02:03:05 INFO yarn.Client: Uploading resource file:/tmp/spark-a16cb29b-ffd1-475a-b1ad-199e356847f1/__spark_conf__6926342778585643878.zip -> hdfs://KSJ001:8020/user/root/.sparkStaging/application_1546050986659_0006/__spark_conf__6926342778585643878.zip
18/12/29 02:03:05 INFO spark.SecurityManager: Changing view acls to: root
18/12/29 02:03:05 INFO spark.SecurityManager: Changing modify acls to: root
18/12/29 02:03:05 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/12/29 02:03:05 INFO yarn.Client: Submitting application 6 to ResourceManager
18/12/29 02:03:05 INFO impl.YarnClientImpl: Submitted application application_1546050986659_0006
18/12/29 02:03:06 INFO yarn.Client: Application report for application_1546050986659_0006 (state: ACCEPTED)
18/12/29 02:03:06 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.users.root
	 start time: 1546066985819
	 final status: UNDEFINED
	 tracking URL: http://KSJ001:8088/proxy/application_1546050986659_0006/
	 user: root
18/12/29 02:03:07 INFO yarn.Client: Application report for application_1546050986659_0006 (state: ACCEPTED)
18/12/29 02:03:08 INFO yarn.Client: Application report for application_1546050986659_0006 (state: ACCEPTED)
18/12/29 02:03:09 INFO yarn.Client: Application report for application_1546050986659_0006 (state: ACCEPTED)
18/12/29 02:03:10 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
18/12/29 02:03:10 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> KSJ001, PROXY_URI_BASES -> http://KSJ001:8088/proxy/application_1546050986659_0006), /proxy/application_1546050986659_0006
18/12/29 02:03:10 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
18/12/29 02:03:10 INFO yarn.Client: Application report for application_1546050986659_0006 (state: RUNNING)
18/12/29 02:03:10 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 192.168.50.242
	 ApplicationMaster RPC port: 0
	 queue: root.users.root
	 start time: 1546066985819
	 final status: UNDEFINED
	 tracking URL: http://KSJ001:8088/proxy/application_1546050986659_0006/
	 user: root
18/12/29 02:03:10 INFO cluster.YarnClientSchedulerBackend: Application application_1546050986659_0006 has started running.
18/12/29 02:03:10 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40814.
18/12/29 02:03:10 INFO netty.NettyBlockTransferService: Server created on 40814
18/12/29 02:03:10 INFO storage.BlockManager: external shuffle service port = 7337
18/12/29 02:03:10 INFO storage.BlockManagerMaster: Trying to register BlockManager
18/12/29 02:03:10 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.50.240:40814 with 511.1 MB RAM, BlockManagerId(driver, 192.168.50.240, 40814)
18/12/29 02:03:10 INFO storage.BlockManagerMaster: Registered BlockManager
18/12/29 02:03:11 INFO scheduler.EventLoggingListener: Logging events to hdfs://KSJ001:8020/user/spark/applicationHistory/application_1546050986659_0006
18/12/29 02:03:11 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener
18/12/29 02:03:11 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/12/29 02:03:11 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
18/12/29 02:03:11 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.11.1
18/12/29 02:03:11 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.11.1
18/12/29 02:03:11 INFO hive.metastore: Mestastore configuration hive.metastore.warehouse.dir changed from file:/tmp/spark-2e667ca5-64ab-4890-b8e2-d9f00e9724ef/metastore to file:/tmp/spark-1a392438-a354-45c0-b109-e87bab82f899/metastore
18/12/29 02:03:11 INFO hive.metastore: Mestastore configuration javax.jdo.option.ConnectionURL changed from jdbc:derby:;databaseName=/tmp/spark-2e667ca5-64ab-4890-b8e2-d9f00e9724ef/metastore;create=true to jdbc:derby:;databaseName=/tmp/spark-1a392438-a354-45c0-b109-e87bab82f899/metastore;create=true
18/12/29 02:03:11 INFO metastore.HiveMetaStore: 0: Shutting down the object store...
18/12/29 02:03:11 INFO HiveMetaStore.audit: ugi=root	ip=unknown-ip-addr	cmd=Shutting down the object store...	
18/12/29 02:03:11 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.
18/12/29 02:03:11 INFO HiveMetaStore.audit: ugi=root	ip=unknown-ip-addr	cmd=Metastore shutdown complete.	
18/12/29 02:03:11 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/12/29 02:03:11 INFO metastore.ObjectStore: ObjectStore, initialize called
18/12/29 02:03:11 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/12/29 02:03:11 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/12/29 02:03:12 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/12/29 02:03:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/12/29 02:03:13 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/12/29 02:03:13 INFO metastore.ObjectStore: Initialized ObjectStore
18/12/29 02:03:13 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/12/29 02:03:13 INFO metastore.HiveMetaStore: Added admin role in metastore
18/12/29 02:03:13 INFO metastore.HiveMetaStore: Added public role in metastore
18/12/29 02:03:13 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/12/29 02:03:13 INFO session.SessionState: Created local directory: /tmp/5d3442f5-4816-4e3f-bd01-7b0a474932f9_resources
18/12/29 02:03:13 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/5d3442f5-4816-4e3f-bd01-7b0a474932f9
18/12/29 02:03:13 INFO session.SessionState: Created local directory: /tmp/root/5d3442f5-4816-4e3f-bd01-7b0a474932f9
18/12/29 02:03:13 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/5d3442f5-4816-4e3f-bd01-7b0a474932f9/_tmp_space.db
18/12/29 02:03:13 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
18/12/29 02:03:13 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/12/29 02:03:13 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.11.1
18/12/29 02:03:13 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.11.1
18/12/29 02:03:13 INFO hive.metastore: Trying to connect to metastore with URI thrift://KSJ001:9083
18/12/29 02:03:13 INFO hive.metastore: Connected to metastore.
18/12/29 02:03:13 INFO session.SessionState: Created local directory: /tmp/bcd6f57d-2edd-4560-8ea0-e69f604c570c_resources
18/12/29 02:03:13 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/bcd6f57d-2edd-4560-8ea0-e69f604c570c
18/12/29 02:03:13 INFO session.SessionState: Created local directory: /tmp/root/bcd6f57d-2edd-4560-8ea0-e69f604c570c
18/12/29 02:03:13 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/bcd6f57d-2edd-4560-8ea0-e69f604c570c/_tmp_space.db
SET spark.sql.hive.version=1.2.1

 

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值