Spark实战(2) DataFrame基础之创建DataFrame

最新推荐文章于 2023-11-30 09:36:05 发布

ZenGeek

最新推荐文章于 2023-11-30 09:36:05 发布

阅读量776

点赞数

分类专栏： Spark 文章标签： Spark

本文链接：https://blog.csdn.net/zeng_xiangt/article/details/83588362

版权

Spark 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

之前，RDD语法占主导，但是比较难用难学.
现在，有了DataFrame，更容易操作和使用spark.

文章目录

创建DataFrame
创建DataFrame(指定Schema)

创建DataFrame

from pyspark.sql import SparkSession
# 新建一个session
spark = SparkSession.builder.appName('Basics').getOrCreate()
# 导入数据
df = spark.read.json('people.json')

df.show() # show the data source
df.printSchema() # print the schema of df
df.columns # to get the column names
df.describte().show() # get a statistical summary of df

创建DataFrame(指定Schema)

#********************************************************************#
# 指定frame结构，然后读取，在实际中更有用！
from pyspark.sql.types import StructField, StringType, IntegerType, StructType

# create the data schema
data_schema = [StructField('age', IntegerType(), True),
               StructField('name',StringType(), True)]
# pass the data schema into the Strucutre type
final_struc = StructType(fileds = data_schema)
# create the dataframe with sepecfied data schema
df = spark.read.json('people.json',schema=final_struc)

ZenGeek

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Spark实战(2) DataFrame基础之创建DataFrame

之前，RDD语法占主导，但是比较难用难学.现在，有了DataFrame，更容易操作和使用spark.文章目录创建DataFrame创建DataFrame(指定Schema)创建DataFramefrom pyspark.sql import SparkSession# 新建一个sessionspark = SparkSession.builder.appName('Basics')...
复制链接

扫一扫