spark业务开发
枣树下的石磙
这个作者很懒,什么都没留下…
展开
-
spark-3.1.2兼容多版本hive
spark 3.1.2版本对多版本hive的兼容处理原创 2022-05-31 23:25:02 · 1496 阅读 · 0 评论 -
运行pyspark,报py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled
解决方式1:配置环境变量:PYTHONPATHexport PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.2-src.zip:$PYTHONPATH解决方式2:使用findspark模块import findsparkfindspark.init()原创 2022-02-15 13:56:20 · 1448 阅读 · 2 评论 -
spark业务开发-列值替换
spark业务开发-列值替换项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据order_number,order_date,purchaser,quantity,product_id,remark10001,2016-01-16,1001,1,102,机q器w记e录r10003,2016-01-17,1002,2,105,人工记录10002,2016-01-19,1002,3,106,人工补录10004,2016-02-21,原创 2022-01-15 10:14:50 · 1555 阅读 · 0 评论 -
spark业务开发-排序
spark业务开发-排序项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据subject,name,score数学,张三,88语文,张三,92英语,张三,77数学,王五,65语文,王五,87英语,王五,90数学,李雷,67语文,李雷,33英语,李雷,24数学,宫九,77语文,宫九,87英语,宫九,90输出数据+-------+----+-----+|subject|name|score|+------原创 2022-01-15 10:10:11 · 617 阅读 · 0 评论 -
spark业务开发-union合并(union)
spark业务开发-union合并(union)项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据集1id,name,profession,enroll,score1,庄劲聪,经济学类,北京理工大学,5512,吴雅思,经济学类,北京理工大学,5293,周育传,经济学类,北京理工大学,6824,丁俊伟,通信工程,北京电子科技学院,7085,庄逸琳,通信工程,北京电子科技学院,7086,吴志发,通信工程,北京电子科技学院,578原创 2022-01-15 10:08:48 · 1453 阅读 · 0 评论 -
spark业务开发-列拆分
spark业务开发-列拆分项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据id,data1,"Ming,20,15552211521"2,"hong,19,13287994007"3,"zhi,21,15552211523"输出数据+---+----+---+-----------+| id| 列1|列2| 列3|+---+----+---+-----------+| 1|Ming| 20|15552原创 2022-01-15 10:07:26 · 1147 阅读 · 0 评论 -
spark业务开发-添加索引列
spark业务开发-添加索引列项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据name,profession,enroll,score曾凰妹,金融学,北京电子科技学院,637谢德炜,金融学,北京电子科技学院,542林逸翔,金融学,北京电子科技学院,543王丽云,金融学,北京电子科技学院,626吴鸿毅,金融学,北京电子科技学院,591施珊珊,经济学类,北京理工大学,581柯祥坤,经济学类,北京理工大学,650庄劲聪,经济原创 2022-01-14 16:20:31 · 506 阅读 · 0 评论 -
spark业务开发-行转列
spark业务开发-行转列项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据subject,name,score数学,张三,88语文,张三,92英语,张三,77数学,王五,65语文,王五,87英语,王五,90数学,李雷,67语文,李雷,33英语,李雷,24数学,宫九,77语文,宫九,87英语,宫九,90输出数据+-------+----+----+----+----+|subject|宫九|张三|李雷|原创 2022-01-14 16:11:36 · 265 阅读 · 0 评论 -
spark业务开发-删除重复行
spark业务开发-删除重复行项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据order_number,order_date,purchaser,quantity,product_id,remark10001,2016-01-16,1001,1,102,机q器w记e录r10003,2016-01-17,1002,2,105,人工记录10002,2016-01-19,1002,3,106,人工补录10004,2016-02-21原创 2022-01-14 15:55:27 · 255 阅读 · 0 评论 -
spark业务开发-列选择
spark业务开发-列选择项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据"id","name","description","weight""102","car battery","12V car battery","8.1""103","12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3","0.8""104","hamme原创 2022-01-14 15:50:12 · 157 阅读 · 0 评论 -
spark业务开发-列转行
spark业务开发-列转行项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据subject,gongjiu,zhangsan,lilei,wangwu英语,90,77,24,90语文,87,92,33,87数学,77,88,67,65输出数据+-------+----+-----+|subject|name|score|+-------+----+-----+| 英语| 77| 77|| 英语| 90原创 2022-01-14 15:31:41 · 361 阅读 · 0 评论 -
spark业务开发-数据清洗
spark业务开发-数据清洗项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据order_number,order_date,purchaser,quantity,product_id,remark10001,2016-01-16,1001,1,102,机q器w记e录r10003,2016-01-17,1002,2,105,人工记录10002,2016-01-19,1002,3,106,人工补录10004,2016-02-21,原创 2022-01-14 15:16:57 · 372 阅读 · 0 评论 -
spark业务开发-聚合(agg)
spark业务开发-聚合(agg)项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据name,profession,enroll,score曾凰妹,金融学,北京电子科技学院,637谢德炜,金融学,北京电子科技学院,542林逸翔,金融学,北京电子科技学院,543王丽云,金融学,北京电子科技学院,626吴鸿毅,金融学,北京电子科技学院,591施珊珊,经济学类,北京理工大学,581柯祥坤,经济学类,北京理工大学,650庄劲聪,原创 2022-01-14 15:06:26 · 439 阅读 · 0 评论 -
spark业务开发-空值处理
spark业务开发-空值处理项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据集"id","name","description","weight""102","car battery","12V car battery","8.1""103","12-pack drill bits","12-pack of drill bits with sizes ranging from #40 to #3","0.8""104","ham原创 2022-01-14 14:53:56 · 455 阅读 · 0 评论 -
spark业务开发-join合并(join)
spark业务开发-join合并(join)项目地址:https://gitee.com/cch-bigdata/spark-process.git输入数据集1order_number,order_date,purchaser,quantity,product_id,remark10001,2016-01-16,1001,1,102,机q器w记e录r10003,2016-01-17,1002,2,105,人工记录10002,2016-01-19,1002,3,106,人工补录10004,2原创 2022-01-14 14:36:57 · 296 阅读 · 0 评论 -
spark业务开发-列过滤(filter)
spark业务开发-列过滤(filter)输入数据order_number,order_date,purchaser,quantity,product_id,remark10001,2016-01-16,1001,1,102,机q器w记e录r10003,2016-01-17,1002,2,105,人工记录10002,2016-01-19,1002,3,106,人工补录10004,2016-02-21,1003,4,107,自然交易10001,2016-01-16,1001,1,102,机器记录原创 2022-01-14 14:21:22 · 843 阅读 · 0 评论