第57课:SparkSQL案例实战学习笔记

第57课:SparkSQL案例实战学习笔记
本期内容:
1.SparkSQL基础案例实战
2.SparkSQL商业类型的案例


进入Spark官网的sql-programming-guide:
http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
可以看到
The entry point into all functionality in Spark SQL is the SQLContext class, or one of its descendants. To create a basic SQLContext, all you need is a SparkContext.
val sc: SparkContext // An existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
sqlContext以SparkContext为参数,具有SparkContext的功能,又扩展了SparkContext的功能。


运行SparkSQL前需要做一个配置,即在${SPARK_HOME}/conf目录下,新建hive-site.xml文件,并加入如下内容:


注意:不是把hive的hive-site.xml拷贝过来,而是需要新建hive-site.xml再做这一个配置即可。因为要用SparkSQL操作Hive的话,是把Hive当作数据仓库,数据仓库就需要有元数据和数据本身,要想访问真正的数据,就需要访问元数据。所以只需要配置hive.metastore.uris就可以了。
为什么要配置hive.metastore.uris?
=>因为底层是Hive作数据仓库做存储引擎,SparkSQL是计算引擎,SparkSQL要访问元数据就需要做这个配置。
另外要访问hive的元数据mysql,需要把mysql-connector-java-5.1.35-bin.jar放入Spark的lib目录下(不配置也可?)。
这个配置是每台机都要配置吗?
=>不需要,只要在Hive机上配置即可。
Spark启动时不会到Hive的文件夹读取hive-site.xml。只跟Hive中的元数据有关系,跟Hive本身没关系。因为Hive只是一个数据仓库,不是一个计算引擎,计算引擎是SparkSQL。


1)启动HDFS:${HADOOP_HOME}/sbin/start-dfs.sh
2)启动Spark:${SPARK_HOME}/sbin/start-all.sh
3)启动metastore服务:
hive --service metastore > metastore.log 2>& 1&
4)进入spark目录下的bin目录,启动spark-shell
./spark-shell --master spark://slq1:7077
5)Spark-shell启动后生成一个hiveContext:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
执行结果:
scala> val hiveContext= new org.apache.spark.sql.hive.HiveContext(sc)
16/03/27 01:45:59 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
16/03/27 01:45:59 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/03/27 01:45:59 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/03/27 01:46:01 INFO hive.metastore: Mestastore configuration hive.metastore.warehouse.dir changed from file:/tmp/spark-b5ed27dd-b732-41f6-bf34-82f3f8fdaa02/metastore to file:/tmp/spark-99274b47-7d87-4fbf-9815-8c913bec38a9/metastore
16/03/27 01:46:01 INFO hive.metastore: Mestastore configuration javax.jdo.option.ConnectionURL changed from jdbc:derby:;databaseName=/tmp/spark-b5ed27dd-b732-41f6-bf34-82f3f8fdaa02/metastore;create=true to jdbc:derby:;databaseName=/tmp/spark-99274b47-7d87-4fbf-9815-8c913bec38a9/metastore;create=true
16/03/27 01:46:01 INFO metastore.HiveMetaStore: 0: Shutting down the object store...
16/03/27 01:46:01 INFO HiveMetaStore.audit: ugi=richard
ip=unknown-ip-addr cmd=Shutting down the object store...

16/03/27 01:46:01 INFO metastore.HiveMetaStore: 0: Metastore shutdown complete.
16/03/27 01:46:01 INFO HiveMetaStore.audit: ugi=richard
ip=unknown-ip-addr cmd=Metastore shutdown complete.

16/03/27 01:46:01 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/03/27 01:46:01 INFO metastore.ObjectStore: ObjectStore, initialize called
16/03/27 01:46:01 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/03/27 01:46:01 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/03/27 01:46:02 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/27 01:46:02 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/03/27 01:46:09 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/03/27 01:46:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/03/27 01:46:12 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/03/27 01:46:26 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/03/27 01:46:26 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/03/27 01:46:30 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/03/27 01:46:30 INFO metastore.ObjectStore: Initialized ObjectStore
16/03/27 01:46:30 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
16/03/27 01:46:32 INFO metastore.HiveMetaStore: Added admin role in metastore
16/03/27 01:46:32 INFO metastore.HiveMetaStore: Added public role in metastore
16/03/27 01:46:33 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/03/27 01:46:33 INFO session.SessionState: Created local directory: /tmp/1711ab6e-62f0-496d-8f04-de91c2808678_resources
16/03/27 01:46:34 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/1711ab6e-62f0-496d-8f04-de91c2808678
16/03/27 01:46:34 INFO session.SessionState: Created local directory: /tmp/richard/1711ab6e-62f0-496d-8f04-de91c2808678
16/03/27 01:46:34 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/1711ab6e-62f0-496d-8f04-de91c2808678/_tmp_space.db
16/03/27 01:46:35 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
16/03/27 01:46:35 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/03/27 01:46:35 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/03/27 01:46:35 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/03/27 01:46:41 INFO hive.metastore: Trying to connect to metastore with URI thrift://slq1:9083
16/03/27 01:46:42 INFO hive.metastore: Connected to metastore.
16/03/27 01:46:43 INFO session.SessionState: Created local directory: /tmp/5f541be5-2b00-43a4-92f8-80ebd5e43d13_resources
16/03/27 01:46:43 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/5f541be5-2b00-43a4-92f8-80ebd5e43d13
16/03/27 01:46:43 INFO session.SessionState: Created local directory: /tmp/richard/5f541be5-2b00-43a4-92f8-80ebd5e43d13
16/03/27 01:46:43 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/5f541be5-2b00-43a4-92f8-80ebd5e43d13/_tmp_space.db
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3b14d63b

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值