![](https://img-blog.csdnimg.cn/20201014180756724.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
spark
安心做个小废物
这个作者很懒,什么都没留下…
展开
-
Spark消费kafka数据 json中包含数组的数据类型
JSON数据格式{ "header": { "traceId": "06ad872d5d5bfa0d", "appName": "zeus-merchant", "deviceType": null, "version": null, "userAgent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko原创 2021-06-11 23:07:06 · 683 阅读 · 2 评论 -
pyspark ValueError: Some of types cannot be determined after inferring
场景:当pandas的DF转换成spark的DF的时候报错 ValueError: Some of types cannot be determined after inferring报错原因是 存在字段spark无法推断它的类型解决方案,直接全部转换成strb['request_market'] = b['request_market'].astype(str)b['request_vin'] = b['request_vin'].astype(str)b['request_br...原创 2021-01-27 18:24:46 · 4504 阅读 · 0 评论 -
spark streaming消费kafka的数据 并写入HDFS和直接写hive表 (scala版本)
首先我消费的kafka的数据的类型为json类型数据 话不多说直接上代码 pom.xml <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">.原创 2021-01-01 01:50:44 · 2119 阅读 · 5 评论 -
spark对DF的基本操作(python版 & scala版)
1.DataFrame column类型转换python: df_green = df_green.withColumn("VendorID", df_green["VendorID"].cast(IntegerType()))scala: val df_green_1 = df_green.withColumn("VendorID", col("VendorID").cast(IntegerType))2.DataFrame 字段的删除python:原创 2020-11-23 16:46:28 · 1988 阅读 · 0 评论 -
spark读取csv生成DF(python和scala两版)
python流程图: python代码:from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()path = 's3://nyc-tlc/test/fhvhv_tripdata_*.csv'df_fhvhv=spark.read.format('csv').option('sep',',').option('inferSchema',True).op...原创 2020-11-23 16:14:08 · 932 阅读 · 0 评论