SQL例子

最新推荐文章于 2023-10-28 17:34:49 发布

杨过悔

最新推荐文章于 2023-10-28 17:34:49 发布

阅读量621

点赞数

分类专栏： pyspark记录

本文链接：https://blog.csdn.net/u013571243/article/details/51172452

版权

pyspark记录专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Spark Example GOGOGO!

1.创建RDD

some_rdd = sc.parallelize([Row(name="John", age=19),Row(name="Smith", age=23),Row(name="Sarah", age=18)])

NOTE:我这里使用HIVE结合MYSQL存放数据元,所以先开启

# 推断SQL DATAFRAME的结构
some_df = sqlContext.createDataFrame(some_rdd)
some_df.printSchema()

可以指定SCHEMA

# Another RDD is created from a list of tuples

another_rdd = sc.parallelize([("John", 19), ("Smith", 23), ("Sarah", 18)])
# Schema with two fields - person_name and person_age
schema = StructType([StructField("person_name", StringType(), False), StructField("person_age", IntegerType(), False)])
# Create a DataFrame by applying the schema to the RDD and print the schema
another_df = sqlContext.createDataFrame(another_rdd, schema)
another_df.printSchema()

从JSON直接创建

if len(sys.argv) < 2:
path = "file://" + os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
else:
path = sys.argv[1]
# Create a DataFrame from the file(s) pointed to by path
people = sqlContext.jsonFile(path)

接着注册表(metastore)

people.registerAsTable("people")

teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")

for each in teenagers.collect():
print(each[0])

杨过悔

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SQL例子

Spark Example GOGOGO! 1.创建RDD some_rdd = sc.parallelize([Row(name="John", age=19),Row(name="Smith", age=23),Row(name="Sarah", age=18)])NOTE:我这里使用HIVE结合MYSQL存放数据元,所以先开启2. # 推断SQL DATAFRAM
复制链接

扫一扫

专栏目录