pyspark.sql
- pyspark.sql.SparkSession Main entry point for DataFrame and SQL functionality. SQL功能和DataFrame的主要入口
- pyspark.sql.DataFrame A distributed collection of data grouped into named columns. 分布式数据集合,感觉有点像pandas的DF
- pyspark.sql.Column A column expression in a DataFrame.
- pyspark.sql.Row A row of data in a DataFrame.
- pyspark.sql.GroupedData Aggregation methods, returned by DataFrame.groupBy(). 不知道这玩意啥用
- pyspark.sql.DataFrameNaFunctions Methods for handling missing data (null values). 缺失值的处理
- pyspark.sql.DataFrameStatFunctions Methods for statistics functionality. 统计功能
- pyspark.sql.functions List of built-in functions available for DataFrame. DataFrame可用的内置函数列表
- pyspark.sql.types List of data types available.数据类型的类型列表
- pyspark.sql.Window For working with window functions.用于处理窗口函数
SparkSession
spark编程 DataFrame and SQL的API
能