转自:Spark Streaming场景应用-Kafka数据读取方式概述Spark Streaming 支持多种实时输入源数据的读取,其中包括Kafka、flume、socket流等等。除了Kafka以外的实时输入源,由于我们的业务场景没有涉及,在此将不会讨论。本篇文章主要着眼于我们目前的业务场景,只关注Spark Streaming读取Kafka数据的方式。 Spark St

翻译 storm-[8]-可靠的Spout-Reliable versus Unreliable Messages

You’ll create a spout that sends 100 random transaction IDs, and a bolt that fails for80% of tuples received  Imagine you are processing bank transactions, and youhave the following requirements:

原创 storm-Custom Grouping

Custom Grouping 实现backtype.storm.grouping.CustomStreamGrouping接口即可完成用户自定义Grouping例如:单次计数按照第一单词的第一个字母mod task数的余数来分配package CostumerGroup;import backtype.storm.grouping.CustomStreamGrou

翻译 storm-All grouping的应用场景1-发送信号量指令

All GroupingAll Grouping sends a single copy of each tuple to all instances of the receiving bolt.This kind of grouping is used to send signals to bolts. For example, if you need to refresh a

翻译 spark-Tuning spark

原文:Tuning SparkTuning SparkData Serialization序列化在分布式应用的性能中扮演重要角色,提供两种序列化:Java serialization: By default, Spark serializes objects using Java’s ObjectOutputStream framework, and can

翻译 spark-Spark Configuration

原文:spark configurationSpark provides three locations to configure the system:Spark properties control most application parameters and can be set by using a SparkConf object, or through Jav

翻译 spark-Cluster Mode Overview整理学习

简单介绍Spark在clusters模式的运行(application submission guide)ComponentsSparkContext可连接到不同的集群管理器(Spark’s own standalone cluster manager, Mesos or YARN)),集群管理器负责分布资源,链接之后在节点中得到executors,executors为ap

原创 SparkSQL-2.0-新特性

Starting Point: SparkSessionThe entry point into all functionality in Spark is the SparkSession class. To create a basic SparkSession, just use SparkSession.builder():import org.apache.s

翻译 storm-[7]-Trident State学习

Trident StateTrident has first-class abstractions for reading from and writing to stateful sources. The state can either be internal to the topology – e.g., kept in-memory and backed by HDFS

转载 java-基础-基本数据类型


翻译 storm-[6]-Trident API

官方文档:Trident API OverviewTrident API OverviewTrident的核心数据模型“stream”,按照一系列的分批(batches)处理,stream在集群节点中分区形式存在,不同分区的operations并行Trident有以下五种operations:Operations that apply locally to

翻译 storm-[5]-Trident实例

翻译学习《storm分布式实时计算模式》检测某地区的疾病突发1-TopoloyOutbreakDetectionTopology中处理函数拓扑: [6]Trident实例 > image2017-5-12 15:40:21.png" height="250" src="https://wiki.sankuai.com/download/attachments/8

原创 hadoop2.7.3下spark2.1.0安装_yarn作业提交

已安装hadoop2.7.2,安装spark2.1.0设f1为master,f2至f5位worker1-下载安装scala-2.11.8https://www.scala-lang.org/download/tar -zxvf  scala-2.11.8.tgz 至/data 下配置环境变量vi /etc/profile export SCA

转载 storm-[4] -java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpout

解决方案源网址Exceptionin thread "main" java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpoutMissing this class?Import into your dependency managerMaven dependency>

翻译 storm-[3]-Trident Tutorial 与调优

源文档:http://storm.apache.org/releases/1.1.0/Trident-tutorial.htmlTrident是Storm顶层实时计算的高度抽象,无缝处理数以百万每秒的高吞吐量,提供低延迟高可用性的分布式查询。提供 joins, aggregations, grouping, functions, and filters操作,此外,可在任何数据库或持

翻译 storm-[2]-storm基本模块编程


原创 storm-[1]-Basics of Storm学习笔记

Documentation · nathanmarz/storm Wiki https://github.com/nathanmarz/storm/wiki/DocumentationStorm, distributed and fault-tolerant realtime computation : http://storm-project.net/http://www.slideshare.

转载 spark-streaming-[10]-Spark Streaming 中使用 zookeeper 保存 offset 并重用

转载于:Spark Streaming 中使用 zookeeper 保存 offset 并重用 多谢分享在 Spark Streaming 中消费 Kafka 数据的时候,有两种方式分别是 1)基于 Receiver-based 的 createStream 方法和 2)Direct Approach (No Receivers) 方式的 createDirectStream 方法,详

原创 Kafka-[3]-KafkaStream

Step 8: Use Kafka Streams to process dataKafka Streams is a client library of Kafka for real-time stream processing and analyzing data stored in Kafka brokers. /** * Licensed to the Apache So

原创 spark-streaming-[9]-SparkStreaming消费Kafka-Direct Approach

spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记中已知:There are two approaches to this - the old approach using Receivers and Kafka’s high-level API, and a new approach (introd

原创 spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记

Spark Streaming + Kafka Integration Guide (Kafka broker version or higher)Here we explain how to configure Spark Streaming to receive data from Kafka. There are two approaches to this - th

原创 Kafka-[2]-Documentation-单机QuickStart

1.3 Quick StartStep 1: Download the codeDownload the release and un-tar it.> tar -xzf kafka_2.11-> cd kafka_2.11- 2: Start the serverKafka uses Zoo

原创 Kafka-[1]-Documentation-概述

原文链接http://kafka.apache.org/documentation.html#consumerapi1.1 IntroductionFirst a few conceptsKafka is run as a cluster on one or more servers.The Kafka cluster stores streams of recor

原创 spark-streaming-[7]-Output Operations on DStreams-foreachRDD写Mysql

foreachRDD(func)The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as

原创 spark-streaming-[6]-KafkaWordCount和KafkaWordCountProducer(Receiver-based Approach)

学习spark streaming中KafkaWordCount和KafkaWordCountProducer官方github代码参考文章:徽沪一郎 Apache Spark技术实战之1 -- KafkaWordCount    感谢分享Spark-Streaming获取kafka数据的两种方式-Receiver与Direct的方式搭建Kafka集群

转载 spark-streaming-[5]-Design Patterns for using foreachRDD

参考:整合Kafka到Spark Streaming——代码示例和挑战githubkafka实例SparkStreaming之foreachRDD写mysql待续。。。。

原创 spark-streaming-[4]-Window Operations

Window Operations As shown in the figure, every time the windowslidesover a source DStream, the source RDDs that fallwithin the window are combined and operated upo

原创 spark-streaming-[3]-Transform

Transform OperationReturn a new DStream by applying a RDD-to-RDD function to every RDDof the source DStream. This can be used to do arbitrary RDD operationson the DStream. Th

