[亲身实践]pyspark读取hive中的表的两种方式

1.window上无法连接到hive,linux上能连接到hive(推荐法二)
法一
(1)把/opt/soft/hive110/conf/hive-site.xml复制到/opt/soft/spark234/conf/hive-site.xml
hive-site.xml不用改变任何东西

在这里插入图片描述
在这里插入图片描述
(2)把mysql的驱动包复制到/opt/soft/spark234/jar下面
在这里插入图片描述
(3)开始启动pyspark

[root@joy sbin]# ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/soft/spark234/logs/spark-root-org.apache.spark.deploy.master.Master-1-joy.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/soft/spark234/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-joy.out
[root@joy sbin]# cd ../bin/
[root@joy bin]# ./pyspark
Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
2020-12-24 11:20:23 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.user does not exist
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.metastore.local does not exist
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client does not exist
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.password does not exist
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.4
      /_/

Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
SparkSession available as 'spark'.
>>> spark.sql("select * from dwd_events.dwd_events limit 3").show
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.user does not exist
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.metastore.local does not exist
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client does not exist
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.password does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.user does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.metastore.local does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.password does not exist
2020-12-24 11:21:31 ERROR ObjectStore:6684 - Version information found in metastore differs 1.1.0-cdh5.14.2 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
2020-12-24 11:21:32 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
<bound method DataFrame.show of DataFrame[eventid: string, starttime: bigint, city: string, province: string, country: string, lat: string, lng: string, userid: string, features: string]>
>>> spark.sql("select * from dwd_events.dwd_events limit 3").show()
+----------+----------+------+--------+---------+-------+-------+----------+--------------------+
|   eventid| starttime|  city|province|  country|    lat|    lng|    userid|            features|
+----------+----------+------+--------+---------+-------+-------+----------+--------------------+
|1000000778|1349348400|  Iasi|        |  Romania| 47.162| 27.587| 781622845|0, 0, 0, 0, 0, 0,...|
|1000001188|1350738000|Kassel|        |  Germany| 51.315|   9.48|4191368038|0, 0, 0, 0, 0, 0,...|
|1000003504|1353632400|Sydney|     NSW|Australia|-33.883|151.217|1445909915|2, 0, 2, 1, 1, 0,...|
+----------+----------+------+--------+---------+-------+-------+----------+--------------------+

(4)把/opt/soft/hive110/conf的conf全部复制到本地的D:\soft\spark-2.3.4-bin-hadoop2.6/conf下
修改为本地的路径

在这里插入图片描述
在这里插入图片描述
(5)把mysql的驱动包复制到D:\soft\spark-2.3.4-bin-hadoop2.6/jar下面
在这里插入图片描述
(6)关掉python,配置spark的环境变量
在这里插入图片描述
在这里插入图片描述
法二

if __name__ == '__main__':
    spark = SparkSession.builder.appName("test")\
    .master("local[*]")\
    .enableHiveSupport().getOrCreate()
    df = spark.sql("select * from dws_events.dws_temp_train limit 3")
    df.show()


if __name__ == '__main__':
    spark = SparkSession.builder.appName("test")\
     .master("local[*]")\
     .config("hive.metastore.uris","thrift://192.168.72.170:9083")\        #加上这个hive.metastore.uris,thrift://
     .enableHiveSupport().getOrCreate()
    df = spark.sql("select * from dws_events.dws_temp_train limit 3")
    df.show()

在这里插入图片描述

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值