PySpark-Recipes : 写数据到Hive(local data)

最新推荐文章于 2024-08-10 11:07:33 发布

今晚打佬虎

最新推荐文章于 2024-08-10 11:07:33 发布

阅读量1.6k

点赞数 1

分类专栏： PySpark-Recipes 文章标签： PySpark Hive

本文链接：https://blog.csdn.net/u014281392/article/details/102507800

版权

PySpark-Recipes 专栏收录该内容

14 篇文章 3 订阅

订阅专栏

把本地数据导入到Hive

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('write_data').getOrCreate()
import pyspark.sql.functions as F
from pyspark.sql.types import *    # Row, StructType, StructField, StringType, IntegerType

建库，建表

hive> create database if not exists Test;
hive> show Test;
hive> create table if not exists Test.wjh_test(
>phone string,
>day int);

hive> show tables;

在这里插入图片描述

少量写入数据

hive> use ima;
hive> insert into wjh_test values('13233344421', 20190808);
hive> insert into wjh_test values('13666655532', 20190909);
hive> select * from wjh_test:

在这里插入图片描述

大量写入数据（本地文件,非hdfs路径下）

# load local data
f = open('/home/今晚打老虎/phone.csv')
# transform > RDD
rdd = spark.sparkContext.parallelize(f).map(lambda x : x.strip('\n').split(','))
#rdd = rdd.map(lambda line: Row(line[0], int(line[1])))
schema = StructType([StructField('phone', StringType(), True), StructField('day', StringType(), True)])
# schema = StructType().add('phone', 'string').add('day', 'string')
df = spark.createDataFrame(rdd, schema)
df.registerTempTable('tempTable')
# 选择表
spark.sql('use Test')
spark.sql('insert into wjh_test select * from tempTable')