scala
安心做个小废物
这个作者很懒,什么都没留下…
展开
-
Spark消费kafka数据 json中包含数组的数据类型
JSON数据格式{ "header": { "traceId": "06ad872d5d5bfa0d", "appName": "zeus-merchant", "deviceType": null, "version": null, "userAgent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko原创 2021-06-11 23:07:06 · 680 阅读 · 2 评论 -
spark对DF的基本操作(python版 & scala版)
1.DataFrame column类型转换python: df_green = df_green.withColumn("VendorID", df_green["VendorID"].cast(IntegerType()))scala: val df_green_1 = df_green.withColumn("VendorID", col("VendorID").cast(IntegerType))2.DataFrame 字段的删除python:原创 2020-11-23 16:46:28 · 1977 阅读 · 0 评论 -
spark读取csv生成DF(python和scala两版)
python流程图: python代码:from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()path = 's3://nyc-tlc/test/fhvhv_tripdata_*.csv'df_fhvhv=spark.read.format('csv').option('sep',',').option('inferSchema',True).op...原创 2020-11-23 16:14:08 · 930 阅读 · 0 评论