Hive学习（二）窗口函数源码阅读2

最新推荐文章于 2023-04-20 09:29:38 发布

假如我有一口缸

最新推荐文章于 2023-04-20 09:29:38 发布

阅读量262

点赞数

分类专栏： hive 文章标签： hive 学习大数据

本文链接：https://blog.csdn.net/m0_46180014/article/details/128891483

版权

hive 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Hive学习（二）窗口函数源码阅读2

前言
源码阅读分析
结尾

前言

上篇阅读了hive窗口函数执行流程的起始部分：
Hive学习（一）窗口函数源码阅读
1.了解了窗口函数在hive中是一种特殊的函数（PTF、table-in table-out）
2.从在PTFOperator中如何积累当前分区内的数据、什么时机下去开启、关闭一个分区的数据积累
3.由PTFPartition对象来保存当前分区的所有数据
4.由TableFunctionEvaluator来负责开始执行窗口函数

本篇紧接上篇，继续看之后窗口函数的执行流程

源码阅读分析

在WindowTableFunction中重写了TableFunctionEvaluator的execute 方法，来开始执行从InputPartition（输入分区）到OutputPartition（输出分区）的转换。
方法中有一行上篇说到的对窗口的范围边界判断的代码：

boolean processWindow = processWindow(wFn.getWindowFrame());

private static boolean processWindow(WindowFrameDef frame) {
    if ( frame == null ) {
      return false;
    }
    if ( frame.getStart().getAmt() == BoundarySpec.UNBOUNDED_AMOUNT &&
        frame.getEnd().getAmt() == BoundarySpec.UNBOUNDED_AMOUNT ) {
      return false;
    }
    return true;
  }

processWindow方法中对本次要执行的窗口函数的窗口定义部分进行了判断，判断窗口范围的首和尾是否同时是无边界的（UNBOUNDED_AMOUNT）。
而窗口定义部分的信息是从 WindowFrameDef 对象中获取的，WindowFrameDef 对象是从WindowFunctionDef对象中获取的

一、WindowFunctionDef 窗口函数定义总类

该类包装了WindowFrameDef 窗口框架定义部分、GenericUDAFEvaluator 窗口函数部分等，可以理解为窗口函数定义的总类，也是用户编写窗口函数sql中包含的所有内容。

public class WindowFunctionDef extends WindowExpressionDef {
    String name;
    boolean isStar;
    boolean isDistinct;
    List<PTFExpressionDef> args;
    WindowFrameDef windowFrame;
    GenericUDAFEvaluator wFnEval;
    boolean pivotResult;
    boolean respectNulls = true;
}

下面先看下WindowFrameDef 窗口框架定义部分

WindowFrameDef 窗口框架定义部分

主要关注几个成员变量

public class WindowFrameDef {
    private WindowType windowType; // ROWS, RANGE
    private BoundaryDef start;     // 向前范围定义
    private BoundaryDef end;       // 向后范围定义
    private final int windowSize;  // 窗口内的条数
    private OrderDef orderDef;    // Order expressions which will only get set and used for RANGE windowing type 排序

    public WindowFrameDef(WindowType windowType, BoundaryDef start, BoundaryDef end) {
        this.windowType = windowType;
        this.start = start;
        this.end = end;

        // Calculate window size
        if (start.getDirection() == end.getDirection()) {
            windowSize = Math.abs(end.getAmt() - start.getAmt()) + 1;
        } else {
            windowSize = end.getAmt() + start.getAmt() + 1;
        }
    }

    public BoundaryDef getStart() {
        return start;
    }

    public BoundaryDef getEnd() {
        return end;
    }

    public WindowType getWindowType() {
        return windowType;
    }

    public void setOrderDef(OrderDef orderDef) {
        this.orderDef = orderDef;
    }

    public OrderDef getOrderDef() throws HiveException {
        if (this.windowType != WindowType.RANGE) {
            throw new HiveException("Order expressions should only be used for RANGE windowing type");
        }
        return orderDef;
    }

    public boolean isStartUnbounded() {
        return start.isUnbounded();
    }

    public boolean isEndUnbounded() {
        return end.isUnbounded();
    }

    public int getWindowSize() {
        return windowSize;
    }
}

下面依次看WindowFrameDef 中的各个成员变量

WindowType windowType

枚举类，定义了该窗口函数是 rows（行范围） 还是 range（值范围）

// The types for ROWS BETWEEN or RANGE BETWEEN windowing spec
    public static enum WindowType {
        ROWS, RANGE
    }

BoundaryDef start 和 BoundaryDef end

BoundaryDef定义了窗口延展的方向和长度范围

public class BoundaryDef {
    Direction direction; // 窗口延展方向
    private int amt; // 长度范围
    private final int relativeOffset;

    public BoundaryDef(Direction direction, int amt) {
        this.direction = direction;
        this.amt = amt;

        // Calculate relative offset
        switch (this.direction) {
            case PRECEDING:
                relativeOffset = -amt;
                break;
            case FOLLOWING:
                relativeOffset = amt;
                break;
            default:
                relativeOffset = 0;
        }
    }
}

Direction

枚举类，定义了窗口延展的3个方向，也对应了sql中的3个关键字：
PRECEDING，CURRENT，FOLLOWING

public static enum Direction {
    PRECEDING, CURRENT, FOLLOWING
}

amt（Amount）

表示窗口向一个方向偏移的长度

windowSize

窗口的长度，计算方式简单易懂：

// Calculate window size
if (start.getDirection() == end.getDirection()) { // 如果窗口首尾的延展方向相同的话
    windowSize = Math.abs(end.getAmt() - start.getAmt()) + 1;
} else {
    windowSize = end.getAmt() + start.getAmt() + 1;
}

OrderDef orderDef

注意：这和 order by xxx sql语句的内容无关！这个排序只在range窗口类型中定义和使用，不多做讨论。

GenericUDAFEvaluator 窗口框架定义部分

用于 Hive 的通用用户定义聚合函数（GenericUDAF 所谓的高级UDAF）。该类定义了一些UDAF应该具备的方法：

public abstract class GenericUDAFEvaluator implements Closeable {
	// Get a new aggregation object.
	public abstract AggregationBuffer getNewAggregationBuffer() throws HiveException;
	// Reset the aggregation. This is useful if we want to reuse the same aggregation.
	public abstract void reset(AggregationBuffer agg) throws HiveException;
	// Iterate through original data.
	public abstract void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException;
	// Get partial aggregation result.
	public abstract Object terminatePartial(AggregationBuffer agg) throws HiveException;
	// Merge with partial aggregation result. NOTE: null might be passed in case there is no input data.
	public abstract void merge(AggregationBuffer agg, Object partial) throws HiveException;
	// Get final aggregation result.
	public abstract Object terminate(AggregationBuffer agg) throws HiveException;
}

GenericUDAFEvaluator 先看到这里，后面会阅读row_number()函数对应的实现类：GenericUDAFAbstractRowNumberEvaluator。

二、构建BasePartitionEvaluator管理计算（前期准备）

以一下sql为例，继续阅读执行逻辑：

row_number() over(partition by userid order by time)

以上函数写法等同于：

row_number() over(partition by userid order by time rows between current row and unbounded following)

是一个rows类型、窗口起始为当前行，窗口结尾为无边界的定义。
那么只在processWindow 方法中返回值为true，并执行以下分支：

// oColumns的泛型是List，每个list都是一个窗口函数运行结束返回的结果
// 所以oColumns将所有窗口函数运行的结果保存了下来
ArrayList<List<?>> oColumns = new ArrayList<List<?>>();
PTFPartition iPart = pItr.getPartition();

WindowTableFunctionDef wTFnDef = (WindowTableFunctionDef) getTableDef();
for (WindowFunctionDef wFn : wTFnDef.getWindowFunctions()) {
    boolean processWindow = processWindow(wFn.getWindowFrame());
    pItr.reset();
    if (!processWindow) {
        ...
    } else {
    	// 走这里
        oColumns.add(executeFnwithWindow(wFn, iPart));
    }
}

oColumns 的泛型是List，每个List都是一个窗口函数运行结束返回的结果，所以oColumns将所有窗口函数运行的结果保存了下来。
所以 executeFnwithWindow 方法返回的也就是一个窗口函数返回的结果了。

executeFnwithWindow 方法

该方法专门用来计算分区中每一行的函数结果，就是每行数据都有一个结果值。

// Evaluate the function result for each row in the partition
ArrayList<Object> executeFnwithWindow(
        WindowFunctionDef wFnDef,
        PTFPartition iPart)
        throws HiveException {
    // vals对象用来保存分区中每一行数据的结果值
    ArrayList<Object> vals = new ArrayList<Object>();
    for (int i = 0; i < iPart.size(); i++) { // 遍历分区中的每一行数据
    	// evaluateWindowFunction方法入参：1.函数定义、2.要计算数据的行号、3.整个分区数据PTFPartition
        Object out = evaluateWindowFunction(wFnDef, i, iPart);
        vals.add(out);
    }
    return vals;
}

vals对象用来保存分区中每一行数据的结果值。

evaluateWindowFunction 方法

给定分区数据和要处理数据的的行号，构建BasePartitionEvaluator，计算结果值：

// Evaluate the result given a partition and the row number to process
private Object evaluateWindowFunction(WindowFunctionDef wFn, int rowToProcess, PTFPartition partition)
        throws HiveException {
    GenericUDAFEvaluator wFnEval = wFn.getWFnEval();
    BasePartitionEvaluator partitionEval = wFnEval
            .getPartitionWindowingEvaluator(wFn.getWindowFrame(), partition, wFn.getArgs(), wFn.getOI(), nullsLast);
    return partitionEval.iterate(rowToProcess, ptfDesc.getLlInfo());
}

方法从WindowFunctionDef中拿出了GenericUDAFEvaluator，并调用getPartitionWindowingEvaluator方法来获取BasePartitionEvaluator，看下getPartitionWindowingEvaluator方法

getPartitionWindowingEvaluator 方法

public final BasePartitionEvaluator getPartitionWindowingEvaluator(
        WindowFrameDef winFrame,
        PTFPartition partition,
        List<PTFExpressionDef> parameters,
        ObjectInspector outputOI, boolean nullsLast) {
    if (partitionEvaluator == null) {
        partitionEvaluator = createPartitionEvaluator(winFrame, partition, parameters, outputOI,
                nullsLast);
    }
    return partitionEvaluator;
}

protected BasePartitionEvaluator createPartitionEvaluator(
        WindowFrameDef winFrame,
        PTFPartition partition,
        List<PTFExpressionDef> parameters,
        ObjectInspector outputOI,
        boolean nullsLast) {
    return new BasePartitionEvaluator(this, winFrame, partition, parameters, outputOI, nullsLast);
}

调用返回了一个BasePartitionEvaluator对象，看下BasePartitionEvaluator的构造方法：

public BasePartitionEvaluator(
        GenericUDAFEvaluator wrappedEvaluator,  // 窗口函数实现
        WindowFrameDef winFrame,                // 窗口框架定义
        PTFPartition partition,					// 分区内的所有数据
        List<PTFExpressionDef> parameters,      // 窗口函数表达式
        ObjectInspector outputOI,
        boolean nullsLast) {
    this.wrappedEvaluator = wrappedEvaluator;
    this.winFrame = winFrame;
    this.partition = partition;
    this.parameters = parameters;
    this.outputOI = outputOI;
    this.nullsLast = nullsLast;
    this.isCountEvaluator = wrappedEvaluator instanceof GenericUDAFCount.GenericUDAFCountEvaluator;
}

所以所这一步就是将窗口的全部定义部分 和 窗口内的全部数据，都一股脑打包给了BasePartitionEvaluator对象，让它来管理接下来的的计算协调。
最后调用BasePartitionEvaluator的 iterate(int currentRow, LeadLagInfo leadLagInfo) 方法，开始对当前行号的数据进行窗口值的聚合计算，并将结果值返回。

三、对当前行号数据的具体聚合计算逻辑

在BasePartitionEvaluator的 iterate 方法中，构建了 Range 行号范围类的构建，并将输入分区 PTFPartition 转化为了一个迭代器：

public Object iterate(int currentRow, LeadLagInfo leadLagInfo) throws HiveException {
    Range range = PTFRangeUtil.getRange(winFrame, currentRow, partition, nullsLast);
    PTFPartitionIterator<Object> pItr = range.iterator();
    return calcFunctionValue(pItr, leadLagInfo);
}

Range对象的构建

为了方便在分区数据中准确的找到 当前行号的窗口 首尾边界的数据行，这预先构建了一个Range对象：

public static Range getRange(WindowFrameDef winFrame, int currRow, PTFPartition p,
                             boolean nullsLast) throws HiveException {
    BoundaryDef startB = winFrame.getStart();
    BoundaryDef endB = winFrame.getEnd();

    int start, end;
    if (winFrame.getWindowType() == WindowType.ROWS) {
        start = getRowBoundaryStart(startB, currRow);
        end = getRowBoundaryEnd(endB, currRow, p);
    } else {
        ValueBoundaryScanner vbs = ValueBoundaryScanner.getScanner(winFrame, nullsLast);
        vbs.handleCache(currRow, p);
        start = vbs.computeStart(currRow, p);
        end = vbs.computeEnd(currRow, p);
    }
    start = start < 0 ? 0 : start;
    end = end > p.size() ? p.size() : end;
    return new Range(start, end, p);
}

Range对象中存了窗口首尾边界这2行数据的索引，和分区数据PTFPartition对象：

public class Range{
  	int start; // 窗口开始行号
  	int end;   // 窗口结束行号
  	PTFPartition p;  // 窗口数据对象
}

PTFPartitionIterator 迭代器的构建

接着为了遍历分区数据，将PTFPartition转换成一个迭代器。

PTFPartitionIterator<Object> pItr = range.iterator();

转换方式是PTFPartition类中定义了一个转换方法range，该方法可以将分区内指定首尾范围的数据，转换为迭代器，而不需要将整个分区都转换：

pulbic class PTFPartition {
	...
	public PTFPartitionIterator<Object> range(int start, int end) {
	    assert (start >= 0);
	    assert (end <= size());
	    assert (start <= end);
	    return new PItr(start, end); // 是PTFPartitino的内部类，且由PTFPartitino的成员方法返回
	}
	
	public PTFPartitionIterator<Object> range(int start, int end, boolean optimisedIteration) {
	    return (optimisedIteration) ? new OptimisedPItr(start, end) : range(start, end);
	}
	...
	class PItr implements PTFPartitionIterator<Object> {
		int idx;         // 迭代过程中指针的偏移量
    	final int start; // 窗口起始
    	final int end;   // 窗口终止

    	PItr(int start, int end) {
      		this.idx = start;
      		this.start = start;
			this.end = end;
			createTimeSz = PTFPartition.this.size();
		}
	
		@Override
		public boolean hasNext() {
			checkForComodification();
			return idx < end;
		}
	
		@Override
		public Object next() {
			checkForComodification();
			try {
				return PTFPartition.this.getAt(idx++);
			} catch (HiveException e) {
				throw new RuntimeException(e);
			}
		}
	}
}

可以看到PItr类是一个PTFPartition的非静态内部类，且实现了Iterator，说明它的实例只能有PTFPartition对象内部来构建并返回。一个PItr中记录了当前当前窗口起始和终止位置，以及迭代过程中指针的偏移量。

遍历PTFPartitionIterator分区迭代器

接着进入calcFunctionValue 方法：

protected Object calcFunctionValue(PTFPartitionIterator<Object> pItr, LeadLagInfo leadLagInfo)
        throws HiveException {
    // To handle the case like SUM(LAG(f)) over(), aggregation function includes
    // LAG/LEAD call
    PTFOperator.connectLeadLagFunctionsToPartition(leadLagInfo, pItr);
	// 构建一个RowNumberBuffer，用来存放本次计算的结果值
    AggregationBuffer aggBuffer = wrappedEvaluator.getNewAggregationBuffer();
    if (isCountEvaluator && parameters == null) {
        // count(*) specific optimisation, where record count would be equal to itr count
        // No need to iterate through entire iterator and read rowContainer again
        return ObjectInspectorUtils.copyToStandardObject(new LongWritable(pItr.count()), outputOI);
    }
	// 构建一个数组，长度为窗口函数要统计的hive字段数量
    Object[] argValues = new Object[parameters == null ? 0 : parameters.size()];
    while (pItr.hasNext()) { // 开始迭代
        Object row = pItr.next(); // 需窗口内的一行
        int i = 0;
        if (parameters != null) {
            for (PTFExpressionDef param : parameters) { // 遍历要计算的所有字段
                argValues[i++] = param.getExprEvaluator().evaluate(row); // 获取该字段在数据里对应的字段值
            }
        }
        wrappedEvaluator.aggregate(aggBuffer, argValues);
    }

    // The object is reused during evaluating, make a copy here
    return ObjectInspectorUtils.copyToStandardObject(wrappedEvaluator.evaluate(aggBuffer), outputOI);
}

AggregationBuffer -> RowNumberBuffer

AggregationBuffer是个接口，用于在聚合过程中存储聚合结果，本次使用RowNumberBuffer，类结构如下：

static class RowNumberBuffer implements AggregationBuffer {
    ArrayList<IntWritable> rowNums;
    int nextRow;
    boolean supportsStreaming;

    void init() {
        rowNums = new ArrayList<IntWritable>();
        nextRow = 1;
        if (supportsStreaming) {
            rowNums.add(null);
        }
    }

    RowNumberBuffer(boolean supportsStreaming) {
        this.supportsStreaming = supportsStreaming;
        init();
    }

    void incr() {
        if (supportsStreaming) {
            rowNums.set(0, new IntWritable(nextRow++));
        } else {
            rowNums.add(new IntWritable(nextRow++));
        }
    }
}

可以看到一个数据使用ArrayList存储，泛型为Hadoop的序列化对象：IntWritable，所以可以把RowNumberBuffer理解为一个泛型为int的ArrayList。
接着开始迭代，遍历到的每条数据，都会从中取出窗口函数 要计算的字段对应的字段值，连同RowNumberBuffer入参给GenericUDAFEvaluator的aggregate方法：

Object[] argValues = new Object[parameters == null ? 0 : parameters.size()];
while (pItr.hasNext()) {
    Object row = pItr.next();
    int i = 0;
    if (parameters != null) {
        for (PTFExpressionDef param : parameters) {
            argValues[i++] = param.getExprEvaluator().evaluate(row);
        }
    }
    wrappedEvaluator.aggregate(aggBuffer, argValues);
}

GenericUDAFRowNumber

aggregate

在GenericUDAFEvaluator的aggregate方法中，会走以下分支：

public void aggregate(AggregationBuffer agg, Object[] parameters) throws HiveException {
    if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {
        iterate(agg, parameters);
    } else {
    	...
    }
}

public abstract void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException;

iterate

iterate是抽象方法，实现在GenericUDAFRowNumber中：

	@Override
	public void iterate(AggregationBuffer agg, Object[] parameters) throws HiveException {
	    ((RowNumberBuffer) agg).incr(); // 调用了上面提到的RowNumberBuffer的incr()方法
	}
...
static class RowNumberBuffer implements AggregationBuffer {
    ArrayList<IntWritable> rowNums;
    int nextRow = 1;
	...
    void incr() {
        if (supportsStreaming) {
            rowNums.set(0, new IntWritable(nextRow++));
        } else {
        	// 主要逻辑就是向Buffer中添加单调递增的Int
            rowNums.add(new IntWritable(nextRow++));
        }
    }
}

可以看到就是向Buffer中添加单调递增的Int，最终Buffer的ArrayList中就保存了一批单调递增的IntWritable对象（注意：此时RowNumberBuffer只代表着当前行currentRow的buffer，是最细粒度的，别搞混了！）。

evaluate

接着执行GenericUDAFEvaluator的evaluate方法，走以下分支，来得出当前行最终的结果值：

public Object evaluate(AggregationBuffer agg) throws HiveException {
    if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {
        ...
    } else {
        return terminate(agg);
    }
}

terminate是抽象方法，来获取最终聚合的结果，实现在GenericUDAFRowNumber中：

@Override
public Object terminate(AggregationBuffer agg) throws HiveException {
    ArrayList<IntWritable> rowNums = ((RowNumberBuffer) agg).rowNums;
    return rowNums;
}

可以看到返回值是一个ArrayList。最后再将这个ArrayList进行深拷贝并返回：

return ObjectInspectorUtils.copyToStandardObject(wrappedEvaluator.evaluate(aggBuffer), outputOI);

最终这个ArrayList便会放入上面提到的vals中。

四、返回最终结果

继续在executeFnwithWindow中遍历分区中的每一行数据：

// Evaluate the function result for each row in the partition
ArrayList<Object> executeFnwithWindow(
        WindowFunctionDef wFnDef,
        PTFPartition iPart)
        throws HiveException {
    ArrayList<Object> vals = new ArrayList<Object>();
    for (int i = 0; i < iPart.size(); i++) {
        Object out = evaluateWindowFunction(wFnDef, i, iPart);
        vals.add(out);
    }
    return vals;
}

遍历结束后，vals中就存放了当前分区内，每一条数据的窗口函数结果值，也就是当前窗口函数的最终结果值。
当将所有vals放入oColumns后，就可以构建输出PTFPartition了。

output PTFPartition

/*
 * Output Columns in the following order
 * - the columns representing the output from Window Fns
 * - the input Rows columns
 */
for (int i = 0; i < iPart.size(); i++) { // 遍历输入PTFPartition
    ArrayList oRow = new ArrayList();
    Object iRow = iPart.getAt(i); // 拿出一行数据

    for (int j = 0; j < oColumns.size(); j++) {
    	// oColumns.size()指的是窗口函数的数量
    	// 这里取数的逻辑：拿出每个第j个窗口函数的总返回值，再从中拿出第i行数据对应的单个结果值，放入iRow
    	// 在这个for循环中行号是不变的，变得是窗口函数
        oRow.add(oColumns.get(j).get(i));
    }

    for (StructField f : inputOI.getAllStructFieldRefs()) {
        oRow.add(inputOI.getStructFieldData(iRow, f));
    }

    outP.append(oRow);

遍历输入PTFPartition，然后从窗口函数结果值中取数，取数逻辑代码注释里有写。最终将完整的一行结果行放入输出PTFPartition中。

结果输出到下一个函数中

最终方法回到了PTFOperator中触发窗口结束的方法finishPartition中调用 execute 方法的地方：

void finishPartition() throws HiveException {
    if (isStreaming()) {
        handleOutputRows(tabFn.finishPartition());
    } else {
        if (tabFn.canIterateOutput()) {
            outputPartRowsItr = inputPart == null ? null :
                    tabFn.iterator(inputPart.iterator());
        } else {
            outputPart = inputPart == null ? null : tabFn.execute(inputPart); // 返回到了这里
            outputPartRowsItr = outputPart == null ? null : outputPart.iterator(); // 将输出PTFPartition转换为迭代器 PTFPartitionIterator<Object>
        }
        if (next != null) {
            if (!next.isStreaming() && !isOutputIterator()) {
                next.inputPart = outputPart;
            } else {
                if (outputPartRowsItr != null) {
                    while (outputPartRowsItr.hasNext()) {
                        next.processRow(outputPartRowsItr.next());
                    }
                }
            }
        }
    }

    if (next != null) {
        next.finishPartition();
    } else {
        if (!isStreaming()) {
            if (outputPartRowsItr != null) {
                while (outputPartRowsItr.hasNext()) {
                	// 遍历输出PTFPartition，将数据逐条送到下一个函数中
                    forward(outputPartRowsItr.next(), outputObjInspector);
                }
            }
        }
    }
}

最后将输出PTFPartition逐条送到下一个函数中，窗口函数的执行流程也就结束了。

结尾

目前已粗略的走完了窗口函数的执行流程，但还是有不少疑问没有解开，例如：
1.我认为GenericUDAFRowNumber的terminate() 方法返回的应该是一个LongWriteable类型的值，但却是一个ArrayList。
接下来应该回去阅读spark的窗口函数源码，并做一下两边的对比。