数据读写
详细看官网:http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader
hive数据
读取:
# 读取hive要加enableHiveSupport(),以可以使用hql对hive进行操作
spark = SparkSession.builder.enableHiveSupport().master("local[*]").appName("read_hive").getOrCreate()
df=spark.sql("select * from age")
df.show()
+--------------+------+
| country|median|
+--------------+------+
| New Zealand| 39.0|
+--------------+------+
only showing top 20 rows
写入:
# 创建表格
spark.sql('create table if not exists age2(name string, num int)')
# 将dataframe写入表格
df.write.insertInto("age2")#动态分区会覆盖,否则是追加模式的
df.write.mode("overwrite).saveAsTable("age2