把hive的计算引擎换成spark,计算效率更快
在学习hive的时候发现,启动hive时会提示,
在hive 2版本,hadoop-MR已经被弃用,请考虑换成spark、tez,或换成hive 1.X版本
配置:
不用安装hive也可以在spark上使用hive,只需要以下两个配置文件。
把core-site.xml和hive-site.xml复制到spark的conf目录下
core-site.xml在hadoop的配置目录下,hive-site.xml在hive的配置目录下
注意:
在《2. 实例:spark sql操作hive表》中,我已经把这两个文件复制过来了,
但我修改了hive-site.xml,为了开启hive metastore
把配置metastore的部分删除掉或注释掉
使用:
启动hive on spark,把hive的计算引擎换成spark
使用spark的yarn模式:
spark-sql \
--master yarn \
--driver-class-path /.../mysql-connector-java-5.1.47.jar
注意:
jdbc的jar包可放在spark的jars目录下,也可在spark-submit、spark-shell、spark-sql时指定。
[root@hadoop01 ~]# spark-sql --master yarn --driver-class-path /export/servers/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar
spark-sql> select count(*) from sogou;
10000
Time taken: 50.853 seconds, Fetched 1 row(s)
spark-sql>
以前尝试的其它操作:
使用hive语句创建person表:
spark-sql> create table person(id int, name string, age int, faceValue int) row format delimited fields terminated by ',' ;
19/04/27 14:21:44 WARN HiveMetaStore: Location: hdfs://master:9000/user/hive/warehouse/person specified for non-external table:person
Time taken: 0.869 seconds
spark-sql> show tables;
default employee false
default person false
Time taken: 0.077 seconds, Fetched 2 row(s)
其中,row format delimited fields terminated by ','
意思是以','结尾的行格式分隔字段
从本地加载数据:
spark-sql> load data local inpath "/test/person.txt" into table person;
查询:
整张表:
spark-sql> select * from person;
1 tingting 23 80
2 ningning 25 90
3 ruhua 27 60
4 mimi 33 85
Time taken: 2.467 seconds, Fetched 4 row(s)
查询大于25岁的,按颜值倒叙排序:
spark-sql> select name, age,faceValue from person where age > 25 order by faceValue desc;
mimi 33 85
ruhua 27 60
Time taken: 1.071 seconds, Fetched 2 row(s)
spark-sql>
报错:
1.ERROR KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
从本地加载数据时报的错,load data local inpath "/test/person.txt" into table person;
但查看表内容select * from person时,发现数据已经导入了,不明白怎么回事,不管了。
2.Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
思路:JDBC警告信息,可以不管,`com.mysql.jdbc.Driver'已经被弃用
解决方法:https://blog.csdn.net/weixin_42323802/article/details/82500458