2017年05月_hjw199089

转载 Spark Streaming场景应用-Kafka数据读取方式

转自：Spark Streaming场景应用-Kafka数据读取方式概述Spark Streaming 支持多种实时输入源数据的读取，其中包括Kafka、flume、socket流等等。除了Kafka以外的实时输入源，由于我们的业务场景没有涉及，在此将不会讨论。本篇文章主要着眼于我们目前的业务场景，只关注Spark Streaming读取Kafka数据的方式。 Spark St

2017-05-31 19:05:26 3722

翻译 storm-[8]-可靠的Spout-Reliable versus Unreliable Messages

You’ll create a spout that sends 100 random transaction IDs, and a bolt that fails for80% of tuples received Imagine you are processing bank transactions, and youhave the following requirements:

2017-05-29 15:29:45 265

原创 storm-Custom Grouping

Custom Grouping 实现backtype.storm.grouping.CustomStreamGrouping接口即可完成用户自定义Grouping例如：单次计数按照第一单词的第一个字母mod task数的余数来分配package CostumerGroup;import backtype.storm.grouping.CustomStreamGrou

2017-05-26 21:47:02 743

翻译 storm-All grouping的应用场景1-发送信号量指令

All GroupingAll Grouping sends a single copy of each tuple to all instances of the receiving bolt.This kind of grouping is used to send signals to bolts. For example, if you need to refresh a

2017-05-26 21:03:47 460

翻译 spark-Tuning spark

原文：Tuning SparkTuning SparkData Serialization序列化在分布式应用的性能中扮演重要角色，提供两种序列化:Java serialization: By default, Spark serializes objects using Java’s ObjectOutputStream framework, and can

2017-05-25 10:38:16 404

翻译 spark-Spark Configuration

原文：spark configurationSpark provides three locations to configure the system:Spark properties control most application parameters and can be set by using a SparkConf object, or through Jav

2017-05-24 21:27:48 704

翻译 spark-Cluster Mode Overview整理学习

简单介绍Spark在clusters模式的运行（application submission guide）ComponentsSparkContext可连接到不同的集群管理器（Spark’s own standalone cluster manager, Mesos or YARN)），集群管理器负责分布资源，链接之后在节点中得到executors，executors为ap

2017-05-24 21:06:46 521

原创 SparkSQL-2.0-新特性

Starting Point: SparkSessionThe entry point into all functionality in Spark is the SparkSession class. To create a basic SparkSession, just use SparkSession.builder():import org.apache.s

2017-05-24 18:23:34 1069

翻译 storm-[7]-Trident State学习

Trident StateTrident has first-class abstractions for reading from and writing to stateful sources. The state can either be internal to the topology – e.g., kept in-memory and backed by HDFS

2017-05-18 21:19:28 445

转载 java-基础-基本数据类型

StringBuffer和StringBuilder的区别

2017-05-16 21:36:26 297

翻译 storm-[6]-Trident API

官方文档：Trident API OverviewTrident API OverviewTrident的核心数据模型“stream”，按照一系列的分批（batches）处理，stream在集群节点中分区形式存在，不同分区的operations并行Trident有以下五种operations：Operations that apply locally to

2017-05-14 16:53:04 344

翻译 storm-[5]-Trident实例

翻译学习《storm分布式实时计算模式》检测某地区的疾病突发1-TopoloyOutbreakDetectionTopology中处理函数拓扑： [6]Trident实例 > image2017-5-12 15:40:21.png" height="250" src="https://wiki.sankuai.com/download/attachments/8

2017-05-12 20:32:34 1024

原创 hadoop2.7.3下spark2.1.0安装_yarn作业提交

已安装hadoop2.7.2，安装spark2.1.0设f1为master，f2至f5位worker1-下载安装scala-2.11.8https://www.scala-lang.org/download/tar -zxvf scala-2.11.8.tgz 至/data 下配置环境变量vi /etc/profile export SCA

2017-05-11 22:39:49 1248

转载 storm-[4] -java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpout

解决方案源网址Exceptionin thread "main" java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpoutMissing this class?Import into your dependency managerMaven dependency>

2017-05-10 21:18:07 1686

翻译 storm-[3]-Trident Tutorial 与调优

源文档：http://storm.apache.org/releases/1.1.0/Trident-tutorial.htmlTrident是Storm顶层实时计算的高度抽象，无缝处理数以百万每秒的高吞吐量，提供低延迟高可用性的分布式查询。提供 joins, aggregations, grouping, functions, and filters操作，此外，可在任何数据库或持

2017-05-10 18:43:00 346

原创 storm-[1]-Basics of Storm学习笔记

Documentation · nathanmarz/storm Wiki https://github.com/nathanmarz/storm/wiki/DocumentationStorm, distributed and fault-tolerant realtime computation : http://storm-project.net/http://www.slideshare.

2017-05-08 20:58:05 357

转载 spark-streaming-[10]-Spark Streaming 中使用 zookeeper 保存 offset 并重用

转载于：Spark Streaming 中使用 zookeeper 保存 offset 并重用多谢分享在 Spark Streaming 中消费 Kafka 数据的时候，有两种方式分别是 1）基于 Receiver-based 的 createStream 方法和 2）Direct Approach (No Receivers) 方式的 createDirectStream 方法，详

2017-05-08 11:56:59 1755

原创 Kafka-[3]-KafkaStream

Step 8: Use Kafka Streams to process dataKafka Streams is a client library of Kafka for real-time stream processing and analyzing data stored in Kafka brokers. /** * Licensed to the Apache So

2017-05-07 15:53:22 2107

原创 spark-streaming-[9]-SparkStreaming消费Kafka-Direct Approach

spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记中已知：There are two approaches to this - the old approach using Receivers and Kafka’s high-level API, and a new approach (introd

2017-05-07 15:34:23 1992

原创 spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记

Spark Streaming + Kafka Integration Guide (Kafka broker version 0.8.2.1 or higher)Here we explain how to configure Spark Streaming to receive data from Kafka. There are two approaches to this - th

2017-05-07 11:52:15 893

原创 Kafka-[2]-Documentation-单机QuickStart

1.3 Quick StartStep 1: Download the codeDownload the 0.10.2.0 release and un-tar it.> tar -xzf kafka_2.11-0.10.2.0.tgz> cd kafka_2.11-0.10.2.0Step 2: Start the serverKafka uses Zoo

2017-05-05 20:21:35 546

原创 Kafka-[1]-Documentation-概述

原文链接http://kafka.apache.org/documentation.html#consumerapi1.1 IntroductionFirst a few conceptsKafka is run as a cluster on one or more servers.The Kafka cluster stores streams of recor

2017-05-05 19:37:00 366

原创 spark-streaming-[7]-Output Operations on DStreams-foreachRDD写Mysql

foreachRDD(func)The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as

2017-05-04 15:38:03 433

原创 spark-streaming-[6]-KafkaWordCount和KafkaWordCountProducer（Receiver-based Approach）

学习spark streaming中KafkaWordCount和KafkaWordCountProducer官方github代码参考文章：徽沪一郎 Apache Spark技术实战之1 -- KafkaWordCount 感谢分享Spark-Streaming获取kafka数据的两种方式-Receiver与Direct的方式搭建Kafka集群

2017-05-03 21:21:48 1170

转载 spark-streaming-[5]-Design Patterns for using foreachRDD

参考：整合Kafka到Spark Streaming——代码示例和挑战githubkafka实例SparkStreaming之foreachRDD写mysql待续。。。。

2017-05-02 20:01:53 339

原创 spark-streaming-[4]-Window Operations

Window Operations As shown in the figure, every time the windowslidesover a source DStream, the source RDDs that fallwithin the window are combined and operated upo

2017-05-02 10:34:58 587

原创 spark-streaming-[3]-Transform

Transform OperationReturn a new DStream by applying a RDD-to-RDD function to every RDDof the source DStream. This can be used to do arbitrary RDD operationson the DStream. Th

2017-05-01 22:00:12 343

hjw199089的博客