Spark源码
文章平均质量分 55
zhixingheyi_tian
Intel Big Data. Spark
展开
-
Spark 源码编译的各种方式
spark build原创 2022-12-02 09:28:02 · 535 阅读 · 0 评论 -
Spark 之 Plan
Spark Plan原创 2022-11-14 20:07:41 · 634 阅读 · 0 评论 -
Spark Decode parquet
Spark Parquet Decode原创 2022-08-17 20:21:02 · 988 阅读 · 0 评论 -
parquet meta data and size
Parquet Meta原创 2022-08-16 10:12:37 · 451 阅读 · 0 评论 -
Spark UT troubleshoot 记录
checkAnswer系列- pivot with null and aggregate type not supported by PivotFirst returns correct result *** FAILED ***null 由 null 变成 0,引入了 多组 c2r,r2c,问题出在ArrowWritableColumnVector读写出了问题。原创 2022-05-10 18:21:00 · 305 阅读 · 0 评论 -
Spark 之 OnHeapColumnVector
allocateColumns /** * Allocates columns to store elements of each field of the schema on heap. * Capacity is the initial capacity of the vector and it will grow as necessary. Capacity is * in number of elements, not number of bytes. */ publi原创 2022-03-26 16:26:46 · 2245 阅读 · 0 评论 -
Spark 3.0 Data Source v2
以parquet 来举例基本的接口实现DataSourceV2 => Table => ScanBuilder => Scan => PartitionReaderFactory= (VectorizedParquetRecordReader )ParquetPartitionReaderFactoryParquetPartitionReaderFactory 包装了 VectorizedParquetRecordReader...原创 2020-09-11 10:47:02 · 764 阅读 · 0 评论 -
Spark 之 ListenerBus
ListenerBus 是一个 trait,可以接受事件,并将事件提交到对应事件的监听器private[spark] trait ListenerBus[L <: AnyRef, E] extends Logging {原创 2020-02-27 11:58:12 · 176 阅读 · 0 评论 -
Spark 之 org.apache.spark.network.util.JavaUtils
spark 递归删除目录的方法,会尝试两种做法若第一种deleteRecursivelyUsingUnixNative不成功,会立即尝试第二种// org.apache.spark.network.util.JavaUtils.java/** * Delete a file or directory and its contents recursively. * Don't fo...原创 2019-10-22 10:06:44 · 920 阅读 · 0 评论 -
Spark 之 InternalRow
InternalRow — Abstract Binary Row FormatInternalRow is also called Catalyst row or Spark SQL row.abstract class InternalRow extends SpecializedGetters with Serializable {}UnsafeRowUnsafeRow is a...原创 2019-04-01 14:32:18 · 858 阅读 · 0 评论 -
Spark 引擎层面的 VectorizedParquet 代码分析
VectorizedParquetRecordReader.java{ /** * The number of rows that have been returned. */ private long rowsReturned;/** * The number of rows that have been reading, including the current...原创 2019-02-25 11:39:27 · 583 阅读 · 1 评论 -
Physical Query Operator
BinaryExecNodeBinary physical operator with two child left and right physical operatorsLeafExecNodeLeaf physical operator with no childrenBy default, the set of all attributes that are produce...原创 2019-01-12 15:31:56 · 240 阅读 · 0 评论 -
Spark 之 Strategy
package object sql { /** * Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting * with the query planner and is not designed to be stable across spark ...原创 2019-01-05 21:21:08 · 461 阅读 · 0 评论