Sparksql On Hive:
windows idea中
sparksql读取hive中的表:
配置idea中,idea中可以执行本地MapReduce
hive的启动时要metastore servert模式:
1.在192.168.58.201hive-site.xm (hive.metastore.thrift.bind.host) 配置项:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.58.203:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.58.201:2181,192.168.58.20:2181,192.168.58.203:2181</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>
<property>
<name>hive.metastore.thrift.bind.host</name>
<value>192.168.58.201</value>
</property>
</configuration>
2.在192.168.58.201上启动:
linux>bin/hive-service metastore &
3.在192.168.58.202上链接hive配置文件:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.58.201:9083</value>
</property>
</configuration>
4.在windows.上配置hosts添加做高可用(可选):
C:\Windows\System32\drivers\etc\hosts
192.168.58.200 mycluster
192.168.58.201 mycluster
5.idea pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
6.idea HiveDemo运行类
package SparkTest
import org.apache.spark.sql.SparkSession
object HiveDemo {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[*]") //把此行注释,打包上传集群, spark-submit 执行
.appName("test")
.enableHiveSupport()
.config("mapreduce.app-submission.cross-platform", "true")
.config("hive.metastore.uris", "thrift://192.168.58.201:9083")
.getOrCreate()
// spark.sql("show databases").show()
val df = spark.sql("select * from testhivedb.rut_demo")
df.printSchema()
df.show()
}
}
结果:root
|-- sessionid: string (nullable = true)
|-- stepno: integer (nullable = true)
|-- url: string (nullable = true)
|-- referral: string (nullable = true)
+---------+------+---+--------+
|sessionid|stepno|url|referral|
+---------+------+---+--------+
| 1| 1| A| NULL|
| 1| 2| B| A|
| 1| 3| C| B|
| 1| 4| D| C|
| 2| 1| A| NULL|
| 2| 2| B| A|
| 2| 3| C| B|
| 2| 4| D| C|
| 3| 1| D| NULL|
| 3| 2| B| D|
| 3| 3| C| B|
| 3| 4| X| C|
| 3| 5| F| X|
+---------+------+---+--------+
Process finished with exit code 0
Sparksql On Hive:
在spark-shell环境中
1.启动Spark集群
linux>sbin/start-all.sh
2.192.168.58.200 hive-site.xml配置文件
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.58.203:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.58.201:2181,192.168.58.202:2181,192.168.58.203:2181</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
<property>
<name>hive.in.test</name>
<value>true</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
</configuration>
3.hive配置文件copy到192.168.58.202的spark目录中的conf中
linux>cd /opt/install/spark/conf/
linux>vi hive-site.xml
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.58.200:9083</value>
</property>
</configuration>
4.192.168.58.200上启动hive
linux>bin/hive --service metastore &
5.192.168.58.202主机:启动spark-sql
linux>spark-shell --master spark://192.168.58.200:7077
scala>spark.sql("select * from testhivedb.rut_demo").show()
#testhivedb为数据库.rut_demo为表
结果:
scala> spark.sql("select * from testhivedb.rut_demo").show()
+---------+------+---+--------+
|sessionid|stepno|url|referral|
+---------+------+---+--------+
| 1| 1| A| NULL|
| 1| 2| B| A|
| 1| 3| C| B|
| 1| 4| D| C|
| 2| 1| A| NULL|
| 2| 2| B| A|
| 2| 3| C| B|
| 2| 4| D| C|
| 3| 1| D| NULL|
| 3| 2| B| D|
| 3| 3| C| B|
| 3| 4| X| C|
| 3| 5| F| X|
+---------+------+---+--------+