Get acquainted:First steps in PySpark 通过实例学习

DB架构

于 2024-08-07 18:59:48 发布

阅读量181

点赞数 2

分类专栏： Pyspark 文章标签： spark 大数据 python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/u011868279/article/details/140935847

版权

Pyspark 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1.Your first data program in PySpark

Data-driven applications, no matter how complex, all boil down to what we can
think of as three meta steps, which are easy to distinguish in a program:
1 We start by loading or reading the data we wish to work with.
2 We transform the data, either via a few simple instructions or a very complex
machine learning model.
3 We then export (or sink) the resulting data, either into a file or by summariz-
ing our findings into a visualization.

NOTE REPL stands for read, evaluate, print, and loop. In the case of Python, it represents the interactive prompt in which we input commands and read results.

1.Configuring how chatty spark is : the log level

2.The DataFrameReader object

3.Splitting our lines of text into lists of words

4.Renaming a column with two ways

5.Exploding a column of arrays into rows of elements

6.lower the case of the words in the data frame

7.Using regexp_extract to keep what looks like a word

8.Filtering rows in your data frame using where or filter

倘若您觉得我写的好，那么请您动动你的小手粉一下我，你的小小鼓励会带来更大的动力。Thanks.

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Get acquainted:First steps in PySpark 通过实例学习

Get acquainted:First steps in PySpark 通过实例学习
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。