Spark SQL读取Hive数据配置及使用Thrift JDBC/ODBC Server访问Spark SQL

Spark SQL可以读取hive中的数据,开启Thrift JDBC/ODBC Server服务可以使其他语言客户端使用Spark SQL.关于Spark SQL中对hive的支持,官方文档说明让人疑惑,好像没有把hive编译进去,需要自己手动编译,官方文档提及:

Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly. Hive support is enabled by adding the -Phive and -Phive-thriftserver flags to Spark’s build. This command builds a new assembly jar that includes Hive. Note that this Hive assembly jar must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.

其实Spark预编译版本中已经加入hive-1.2.1,只是没有编译到Spark assembly中,spark lib下有hive支持包.在官方文档后面可以看到:

Note that independent of the version of Hive that is being used to talk to the metastore, internally Spark SQL will compile against Hive 1.2.1 and use those classes for internal execution (serdes, UDFs, UDAFs, etc).

Spark SQL既支持通过JDBC访问其他数据库,也可以通过Spark SQL JDBC server来使其他应用使用Spark SQL查询,需要注意这两者之间的区别.

Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD. This is because the results are returned as a DataFrame and they can easily be processed in Spark SQL or joined with other data sources. The JDBC data source is also easier to use from Java or Python as it does not require the user to provide a ClassTag. (Note that this is different than the Spark SQL JDBC server, which allows other applications to run queries using Spark SQL).

Hive安装(元数据放在MySql)

安装MySql,开启远程访问服务

安装mysql,在终端输入

sudo apt-get install mysql-server mysql-client libmysqlclient-dev

期间会弹出设置root账户的密码框,输入两次相同密码。
测试是否安装成功

sudo netstat -tap | grep mysql

使用root用户登录MySql

mysql -u root -p
mysql> use mysql;
mysql> select host,user,password from user;
mysql> insert into user(host,user,password) values('local',hive,PASSWORD('123456'));
mysql>grant all on . to hive@’%’ identified by ‘123456’;
mysql>flush privileges;

设置远程访问(正常情况下,mysql占用的3306端口只是在IP 127.0.0.1上监听,拒绝了其他IP的访问(通过netstat可以查看到)。取消本地监听需要修改 /etc/mysql/mysql.conf.d/mysqld.cnf 文件:)

sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf

bind-address = 127.0.0.1 //找到此内容并且注释
重启mysql服务

service mysql restart

远程连接mysql,查看是否

  • 11
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值