2016年04月_breeze_lsw

12月 09月 08月 07月 05月 04月 03月 01月

原创 Mesos shuffle service unusable in Spark1.6

报错提示:WARN TaskSetManager: Lost task 132.0 in stage 2.0 (TID 5951, spark047207): java.io.FileNotFoundException: /data1/spark/tmp/blockmgr-5363024d-29a4-4f6f-bf87-127b95669c7c/1c/temp_shuffle_7dad1a33-28

2016-04-25 22:13:09 834

原创 Spark Shuffle FetchFailedException解决方案

在大规模数据处理中，这是个比较常见的错误。报错提示org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0org.apache.spark.shuffle.FetchFailedException:Failed to connect to hostname/192

2016-04-21 22:25:30 53013 9

原创 Spark报错 driver did not authorize commit

启动Spark Speculative后，有时候运行任务会发现如下提示：WARN TaskSetManager: Lost task 55.0 in stage 15.0 (TID 20815, spark047216)org.apache.spark.executor.CommitDeniedException: attempt_201604191557_0015_m_000055_0: Not

2016-04-19 17:05:16 6402

原创 Spark将大量分区写入HDFS报错

数据分析后DataFrame此时有2W个分区（170W条数据），使用parquet命令，往一个hdfs文件中同时写入了大量的碎文件。提示(省略无用信息):WARN TaskSetManager: Lost task: org.apache.spark.SparkException: Task failed while writing rows.WARN TaskSetManager: Lost ta

2016-04-14 11:52:56 13533 1

原创 Spark Streaming kafka 实现数据零丢失的几种方式

Definitions问题开始之前先解释下流处理中的一些概念：At most once - 每条数据最多被处理一次（0次或1次）At least once - 每条数据最少被处理一次 (1次或更多)Exactly once - 每条数据只会被处理一次（没有数据会丢失，并且没有数据会被多次处理）high level API如果不做容错，将会带来数据丢失因为receiver一直在接收数据，在其没

2016-04-12 14:31:40 6133 2

spark_prometheus_metrics.json

博客https://blog.csdn.net/lsshlsw/article/details/82670508 spark_prometheus_metrics.json

2018-09-13

scala for spark

因为spark是用scala编写的，这里做了点总结，方便看spark源码。

2014-09-28

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人