Get acquainted:First steps in PySpark 通过实例学习

1.Your first data program in PySpark

Data-driven applications, no matter how complex, all boil down to what we can
think of as three meta steps, which are easy to distinguish in a program:
1 We start by loading or reading the data we wish to work with.
2 We transform the data, either via a few simple instructions or a very complex
machine learning model.

3 We then export (or sink) the resulting data, either into a file or by summariz-
ing our findings into a visualization.

NOTE REPL stands for read, evaluate, print, and loop. In the case of Python, it represents the interactive prompt in which we input commands and read results.

 1.Configuring how chatty spark is : the log level

2.The DataFrameReader object

 3.Splitting our lines of text into lists of words

4.Renaming a column with two ways

5.Exploding a column of arrays into rows of elements

6.lower the case of the words in the data frame

7.Using regexp_extract to keep what looks like a word

8.Filtering rows in your data frame using where or filter

倘若您觉得我写的好,那么请您动动你的小手粉一下我,你的小小鼓励会带来更大的动力。Thanks.

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值