转载 Spark Streaming场景应用-Kafka数据读取方式
转自:Spark Streaming场景应用-Kafka数据读取方式概述Spark Streaming 支持多种实时输入源数据的读取,其中包括Kafka、flume、socket流等等。除了Kafka以外的实时输入源,由于我们的业务场景没有涉及,在此将不会讨论。本篇文章主要着眼于我们目前的业务场景,只关注Spark Streaming读取Kafka数据的方式。 Spark St
2017-05-31 19:05:26 3722
翻译 storm-[8]-可靠的Spout-Reliable versus Unreliable Messages
You’ll create a spout that sends 100 random transaction IDs, and a bolt that fails for80% of tuples received Imagine you are processing bank transactions, and youhave the following requirements:
2017-05-29 15:29:45 265
原创 storm-Custom Grouping
Custom Grouping 实现backtype.storm.grouping.CustomStreamGrouping接口即可完成用户自定义Grouping例如:单次计数按照第一单词的第一个字母mod task数的余数来分配package CostumerGroup;import backtype.storm.grouping.CustomStreamGrou
2017-05-26 21:47:02 743
翻译 storm-All grouping的应用场景1-发送信号量指令
All GroupingAll Grouping sends a single copy of each tuple to all instances of the receiving bolt.This kind of grouping is used to send signals to bolts. For example, if you need to refresh a
2017-05-26 21:03:47 460
翻译 spark-Tuning spark
原文:Tuning SparkTuning SparkData Serialization序列化在分布式应用的性能中扮演重要角色,提供两种序列化:Java serialization: By default, Spark serializes objects using Java’s ObjectOutputStream framework, and can
2017-05-25 10:38:16 404
翻译 spark-Spark Configuration
原文:spark configurationSpark provides three locations to configure the system:Spark properties control most application parameters and can be set by using a SparkConf object, or through Jav
2017-05-24 21:27:48 704
翻译 spark-Cluster Mode Overview整理学习
简单介绍Spark在clusters模式的运行(application submission guide)ComponentsSparkContext可连接到不同的集群管理器(Spark’s own standalone cluster manager, Mesos or YARN)),集群管理器负责分布资源,链接之后在节点中得到executors,executors为ap
2017-05-24 21:06:46 521
原创 SparkSQL-2.0-新特性
Starting Point: SparkSessionThe entry point into all functionality in Spark is the SparkSession class. To create a basic SparkSession, just use SparkSession.builder():import org.apache.s
2017-05-24 18:23:34 1069
翻译 storm-[7]-Trident State学习
Trident StateTrident has first-class abstractions for reading from and writing to stateful sources. The state can either be internal to the topology – e.g., kept in-memory and backed by HDFS
2017-05-18 21:19:28 445
翻译 storm-[6]-Trident API
官方文档:Trident API OverviewTrident API OverviewTrident的核心数据模型“stream”,按照一系列的分批(batches)处理,stream在集群节点中分区形式存在,不同分区的operations并行Trident有以下五种operations:Operations that apply locally to
2017-05-14 16:53:04 344
翻译 storm-[5]-Trident实例
翻译学习《storm分布式实时计算模式》检测某地区的疾病突发1-TopoloyOutbreakDetectionTopology中处理函数拓扑: [6]Trident实例 > image2017-5-12 15:40:21.png" height="250" src="https://wiki.sankuai.com/download/attachments/8
2017-05-12 20:32:34 1024
原创 hadoop2.7.3下spark2.1.0安装_yarn作业提交
已安装hadoop2.7.2,安装spark2.1.0设f1为master,f2至f5位worker1-下载安装scala-2.11.8https://www.scala-lang.org/download/tar -zxvf scala-2.11.8.tgz 至/data 下配置环境变量vi /etc/profile export SCA
2017-05-11 22:39:49 1248
转载 storm-[4] -java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpout
解决方案源网址Exceptionin thread "main" java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpoutMissing this class?Import into your dependency managerMaven dependency>
2017-05-10 21:18:07 1686
翻译 storm-[3]-Trident Tutorial 与调优
源文档:http://storm.apache.org/releases/1.1.0/Trident-tutorial.htmlTrident是Storm顶层实时计算的高度抽象,无缝处理数以百万每秒的高吞吐量,提供低延迟高可用性的分布式查询。提供 joins, aggregations, grouping, functions, and filters操作,此外,可在任何数据库或持
2017-05-10 18:43:00 346
原创 storm-[1]-Basics of Storm学习笔记
Documentation · nathanmarz/storm Wiki https://github.com/nathanmarz/storm/wiki/DocumentationStorm, distributed and fault-tolerant realtime computation : http://storm-project.net/http://www.slideshare.
2017-05-08 20:58:05 357
转载 spark-streaming-[10]-Spark Streaming 中使用 zookeeper 保存 offset 并重用
转载于:Spark Streaming 中使用 zookeeper 保存 offset 并重用 多谢分享在 Spark Streaming 中消费 Kafka 数据的时候,有两种方式分别是 1)基于 Receiver-based 的 createStream 方法和 2)Direct Approach (No Receivers) 方式的 createDirectStream 方法,详
2017-05-08 11:56:59 1755
原创 Kafka-[3]-KafkaStream
Step 8: Use Kafka Streams to process dataKafka Streams is a client library of Kafka for real-time stream processing and analyzing data stored in Kafka brokers. /** * Licensed to the Apache So
2017-05-07 15:53:22 2107
原创 spark-streaming-[9]-SparkStreaming消费Kafka-Direct Approach
spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记中已知:There are two approaches to this - the old approach using Receivers and Kafka’s high-level API, and a new approach (introd
2017-05-07 15:34:23 1992
原创 spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记
Spark Streaming + Kafka Integration Guide (Kafka broker version 0.8.2.1 or higher)Here we explain how to configure Spark Streaming to receive data from Kafka. There are two approaches to this - th
2017-05-07 11:52:15 893
原创 Kafka-[2]-Documentation-单机QuickStart
1.3 Quick StartStep 1: Download the codeDownload the 0.10.2.0 release and un-tar it.> tar -xzf kafka_2.11-0.10.2.0.tgz> cd kafka_2.11-0.10.2.0Step 2: Start the serverKafka uses Zoo
2017-05-05 20:21:35 546
原创 Kafka-[1]-Documentation-概述
原文链接http://kafka.apache.org/documentation.html#consumerapi1.1 IntroductionFirst a few conceptsKafka is run as a cluster on one or more servers.The Kafka cluster stores streams of recor
2017-05-05 19:37:00 366
原创 spark-streaming-[7]-Output Operations on DStreams-foreachRDD写Mysql
foreachRDD(func)The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as
2017-05-04 15:38:03 433
原创 spark-streaming-[6]-KafkaWordCount和KafkaWordCountProducer(Receiver-based Approach)
学习spark streaming中KafkaWordCount和KafkaWordCountProducer官方github代码参考文章:徽沪一郎 Apache Spark技术实战之1 -- KafkaWordCount 感谢分享Spark-Streaming获取kafka数据的两种方式-Receiver与Direct的方式搭建Kafka集群
2017-05-03 21:21:48 1170
转载 spark-streaming-[5]-Design Patterns for using foreachRDD
参考:整合Kafka到Spark Streaming——代码示例和挑战githubkafka实例SparkStreaming之foreachRDD写mysql待续。。。。
2017-05-02 20:01:53 339
原创 spark-streaming-[4]-Window Operations
Window Operations As shown in the figure, every time the windowslidesover a source DStream, the source RDDs that fallwithin the window are combined and operated upo
2017-05-02 10:34:58 587
原创 spark-streaming-[3]-Transform
Transform OperationReturn a new DStream by applying a RDD-to-RDD function to every RDDof the source DStream. This can be used to do arbitrary RDD operationson the DStream. Th
2017-05-01 22:00:12 343