if __name__ == '__main__':
spark = SparkSession.builder.master('local[*]').appName('create_park').getOrCreate()
df1 = spark.read.csv(path="file:tmp/pycharm_project_681/datas/1960-2019全球GDP数据.csv", encoding="gbk",
header=True)
df1.show()
df1.printSchema()
df1.select('year', 'gdp').show()
df1.select(df1['year']).show()
df1.select(['year', 'gdp']).show()
df1.select(F.count('year').name("cnt")).show()
df1.groupBy("country").count().select(F.max("count").name("c")).show()
df1.groupBy("country").agg(F.max("gdp"), F.min("gdp")).show()
pyspark中一些DSL风格的简单操作
最新推荐文章于 2024-10-02 05:39:15 发布