Spark Streaming入门

最新推荐文章于 2022-09-12 15:44:16 发布

慢熟的孩子

最新推荐文章于 2022-09-12 15:44:16 发布

阅读量239

点赞数 1

分类专栏：大数据

本文链接：https://blog.csdn.net/qq_45400755/article/details/102725892

版权

概述
应用场景
集成Spark生态系统的使用
发展史
从词频统计功能入手
工作原理

在这里插入图片描述
概述

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

scalable – 可扩展
high-throughput–高吞吐量的
fault-tolerant --容错

Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams

将不同的数据源数据经过Spark Streaming将结果输出到文件外部系统

特点：低延时，从错误中高效恢复出来，能够运行在成百上千的节点，能够将批处理、机器学习、图计算和Spark Streaming综合起来使用

在这里插入图片描述

it works as follows. Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches

One stack to rule them all;一栈式

应用场景

银行业、电信行业、电子行业、工业界、电商行业、实时监控、web系统运行过程解决error信息

集成Spark生态系统的使用

Combine batch with streaming processing
Join data stream with static data sets

//Create data set from Hadoop life
val dataset = sparkContext.hadoopFile(“file”)
从一个文件系统上把一个文件读取出来

//Join each batch in stream with dataset
kafkaStream.transform{batchRDD=>batchRDD.join(dataset).filter(…)}

Learn model offline,apply them online

//learn model offline
val model=KMeans.train(dataset,…)

//Apply model online on stream
kafkaStream.map{event=>model.predict(event.feature)}

Interactively query streaming data with SQL

//Register each batch in stream as table
KafkaStream.map{batchRDD=>batchRDD.registerTempTable(“leastEvents”)}

//Interacively query table
sqlContext.sql(“select* from lastEvents”)

发展史
在这里插入图片描述
从词频统计功能入手

spark-submit执行
spark-shell执行

从Spark源码入手
githup
https://githup.com/apache/spark

spark-submit执行

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 *

最低0.47元/天解锁文章

慢熟的孩子

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark Streaming入门

概述应用场景集成Spark生态系统的使用发展史从词频统计功能入手工作原理概述Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams....
复制链接

扫一扫