Spark Streaming入门

  • 概述
  • 应用场景
  • 集成Spark生态系统的使用
  • 发展史
  • 从词频统计功能入手
  • 工作原理

在这里插入图片描述
概述
在这里插入图片描述

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

scalable – 可扩展
high-throughput–高吞吐量的
fault-tolerant --容错

Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams

将不同的数据源数据经过Spark Streaming将结果输出到文件外部系统

特点:低延时,从错误中高效恢复出来,能够运行在成百上千的节点,能够将批处理、机器学习、图计算和Spark Streaming综合起来使用

在这里插入图片描述

it works as follows. Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches

One stack to rule them all;一栈式

应用场景

银行业、电信行业、电子行业、工业界、电商行业、实时监控、web系统运行过程解决error信息

集成Spark生态系统的使用

Combine batch with streaming processing
Join data stream with static data sets

//Create data set from Hadoop life
val dataset = sparkContext.hadoopFile(“file”)
从一个文件系统上把一个文件读取出来

//Join each batch in stream with dataset
kafkaStream.transform{batchRDD=>batchRDD.join(dataset).filter(…)}

Learn model offline,apply them online

//learn model offline
val model=KMeans.train(dataset,…)

//Apply model online on stream
kafkaStream.map{event=>model.predict(event.feature)}

Interactively query streaming data with SQL

//Register each batch in stream as table
KafkaStream.map{batchRDD=>batchRDD.registerTempTable(“leastEvents”)}

//Interacively query table
sqlContext.sql(“select* from lastEvents”)

发展史
在这里插入图片描述
从词频统计功能入手

  • spark-submit执行
  • spark-shell执行

从Spark源码入手
githup
https://githup.com/apache/spark

spark-submit执行

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 *
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值