dataframe基础
- 1. 连接本地spark
- 2. 创建dataframe
- 3. 查看字段类型
- 4. 查看列名
- 5. 查看行数
- 6. 重命名列名
- 7. 选择和切片筛选
- 8. 删除一列
- 增加一列
- 9. 转json
- 10. 排序
- 11. 缺失值
- 12. sparkDataFrame和python变量互转
1. 连接本地spark
import pandas as pd
from pyspark.sql import SparkSession
spark = SparkSession
.builder
.appName('my_first_app_name')
.getOrCreate()
2. 创建dataframe
# 从pandas dataframe创建spark dataframe
colors = ['white','green','yellow','red','brown','pink']
color_df=pd.DataFrame(colors,columns=['color'])
color_df['length']=color_df['color'].apply(len)
color_df=spark.createDataFrame(color_df)
color_df.show()