sql里经常会遇到行转列or列转行,如果数据框为{“A”,[1,2])},需要行转列为{(“A”,1),(“B”,2)}。话不多说,直接看代码。
import pyspark.sql.functions as F
from pyspark.sql import SparkSession
# 创建SparkSession对象,调用.builder类
# .appName("testapp")方法给应用程序一个名字;.getOrCreate()方法创建或着获取一个已经创建的SparkSession
spark = SparkSession.builder.appName("pysaprk").getOrCreate()
df = spark.createDataFrame(data=[("A", [1, 2]), ("B", [3, 4])],
schema=["id", "index"])
df.withColumn("index_sub",F.explode(F.col("index"))).show(truncate=False)
+---+------+---------+
|id |index |index_sub|
+---+------+---------+
|A |[1, 2]|1 |
|A |[1, 2]|2 |
|B |[3, 4]|3 |
|B |[3, 4]|4 |
+---+------+---------+
2020-09-18 于南京市江宁区九龙湖