pyspark 访问hive 存放数据
代码
import findspark
findspark.init()
from pyspark.sql import SparkSession, HiveContext
'''
需要bin/spark-sql.sh可以运行
需要配置hive-site xml的metastore uri
并开启该metastore节点的server
hive --service metastore &
'''
_SPARK_HOST = "spark://192.168.21.67:7077"
_APP_NAME = "sparkhive"
spark = SparkSession \
.builder \
.appName(_APP_NAME).config('spark.sql.catalogImplementation','hive') \
.getOrCreate()
#spark = SparkSession.builder.master(_SPARK_HOST).appName(_APP_NAME).getOrCreate()
data = [
(1, "3", "145"),
(1, "4", "146"),
(1, "5", "25"),
(1, "6", "26"),
(2, "32", "32"),
(2, "8", "134"),
(2, "8", "134"),
(2, "9", "137")
]
df = spark.createDataFrame(data, ['id', "test_id", 'camera_id'])
# method one,default是默认数据库的名字,write_test 是要写到default中数据表的名字
df.registerTempTable('test_hive')
spark.sql("create table default.write_test select * from test_hive")
配置
spark正常工作而且 需要bin/spark-sql.sh可以运行
需要配置hive-site xml的metastore uri (host:port)
复制hive site到spark 的conf文件夹下
并开启该metastore节点host的server
metastore-node-host:/media/haiqing/data2/apps/hive-1.1.0-cdh5.16.1$ bin/hive --service metastore &