spark 算子例子_大数据：Spark Core(四)用LogQuery的例子来说明Executor是如何运算RDD的算子...

已退乎

于 2020-12-24 14:22:31 发布

阅读量79

收藏

点赞数

文章标签： spark 算子例子

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_36427956/article/details/111966809

版权

大数据：

Spark Core

(四)用

LogQuery

的例子来说明

Executor

是如何运算

RDD

的算子

1.

究竟是怎么运行的？

很多的博客里大量的讲了什么是

RDD, Dependency, Shuffle...

但是究竟那些

Executor

是怎么

运行你提交的代码段的？

下面是一个日志分析的例子，来自

Spark

的

example

[plain] view plain copy

def main(args: Array[String]) {

val sparkConf = new SparkConf().setAppName("Log Query")

val sc = new SparkContext(sparkConf)

val dataSet =

if (args.length == 1) sc.textFile(args(0)) else sc.parallelize(exampleApacheLogs)

// scalastyle:off

val apacheLogRegex =

"""^([\d.]+) (\S+) (\S+)

([\w\d:/]+\s[+\-]\d4)

"(.+?)" (\d{3}) ([\d\-]+) "([^"]+)" "([^"]+)".*""".r

// scalastyle:on

/** Tracks the total query count and number of aggregate bytes for a particular group. */

class Stats(val count: Int, val numBytes: Int) extends Serializable {

def merge(other: Stats): Stats = {

new Stats(count + other.count, numBytes + other.numBytes)

}

override def toString: String = "bytes=%s\tn=%s".format(numBytes, count)

}

def extractKey(line: String): (String, String, String) = {

apacheLogRegex.findFirstIn(line) match {

case Some(apacheLogRegex(ip, _, user, dateTime, query, status, bytes, referer, ua)) =>

if (user != "\"-\"") (ip, user, query)

else (null, null, null)

case _ => (null, null, null)

}

}

def extractStats(line: String): Stats = {

apacheLogRegex.findFirstIn(line) match {

case Some(apacheLogRegex(ip, _, user, dateTime, query, status, bytes, referer, ua)) =>

new Stats(1, bytes.toInt)

case _ => new Stats(1, 0)

已退乎

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。

余额充值