【软件工程实践】Pig项目12-Data目录源码分析-其他元组2

本文链接：https://blog.csdn.net/Aulic/article/details/121940233

本文主要分析Pig项目中不同类型的元组，包括AppendableSchemaTuple、SchemaTuple、BinSedesTuple、NonWritableTuple、TargetedTuple和TimestampedTuple。每个元组都有其特定的用途，如AppendableSchemaTuple用于动态扩展，SchemaTuple是类型感知的快速元组，BinSedesTuple优化序列化，NonWritableTuple是不可写元组，TargetedTuple存储运算符，TimestampedTuple带有时间戳信息。这些元组在Pig Latin的实现中扮演重要角色。

摘要由CSDN通过智能技术生成

2021SC@SDUSC

上篇我们讲解了AmendableTuple，本篇继续讲解其他元组

其他元组

AppendableSahemaTuple

ApendableSachemaTuples是一个抽象类，其UML如下

继承关系

public abstract class AppendableSchemaTuple<T extends AppendableSchemaTuple<T>> extends SchemaTuple<T>

继承自SchemaTuple这个也没有接触过，并且该类也没有任何注释，也没有测试函数，由于是抽象类甚至没有构造函数，于是只能大致看一下public的方法

@Override
public void append(Object val) {
    if (appendedFields == null) {
        appendedFields = mTupleFactory.newTuple();
    }

    appendedFields.append(val);
}

public SchemaTuple<T> set(List<Object> l) throws ExecException {
    int listSize = l.size();
    int schemaSize = schemaSize();

    if (listSize < schemaSize) {
        throw new ExecException("Given list of objects has too few fields ("+l.size()+" vs "+schemaSize()+")");
    }

    Iterator<Object> it = l.iterator();

    generatedCodeSetIterator(it);

    resetAppendedFields();

    while (it.hasNext()) {
        append(it.next());
    }

    return this;
}

public void set(int fieldNum, Object val) throws ExecException {
    int diff = fieldNum - schemaSize();
    if (diff >= 0 && diff < appendedFieldsSize()) {
        setAppendedField(diff, val);
    } else {
        super.set(fieldNum, val);
    }
}

set函数将 ObjectList里的数据加入到了Tuple中

接下来看看它的父类

SchemaTuple

继承关系

public abstract class SchemaTuple<T extends SchemaTuple<T>> extends AbstractTuple implements TypeAwareTuple {

UML

大小很大，只能给出部分了

一些注释

/**
* SchemaTuple 是一种类型感知元组，速度更快，内存效率更高。在我们的实现中，给定一个 Schema，
* 代码生成用于扩展这个类。此类提供了广泛的功能，可最大限度地降低必须生成的代码的复杂性。
* 奇怪的通用签名允许进行某些优化，例如“setSpecific(T t)”，它允许我们在类型匹配时进行更快的设置和比较
* （因为代码是生成的，没有其他方法可以知道）。
*/

这个类很有意思，是一个快速代码生成器，推测是Pig Latin实现的组成部分，它的成员方法有很大部分是用于实现代码生成的，说实话已经超出了数据结构分析的范围，而是在探讨系统实现了，并且有部分是抽象方法，例如刚才ApendableSachemaTuple实现的方法

为了进一步理解，我们直接看它的父类TypeAwareTuple，是一个接口，代码如下

public interface TypeAwareTuple extends Tuple {
    public void setInt(int idx, int val) throws ExecException;
    public void setFloat(int idx, float val) throws ExecException;
    public void setDouble(int idx, double val) throws ExecException;
    public void setLong(int idx, long val) throws ExecException;
    public void setString(int idx, String val) throws ExecException;
    public void setBoolean(int idx, boolean val) throws ExecException;
    public void setBigInteger(int idx, BigInteger val) throws ExecException;
    public void setBigDecimal(int idx, BigDecimal val) throws ExecException;
    public void setBytes(int idx, byte[] val) throws ExecException;
    public void setTuple(int idx, Tuple val) throws ExecException;
    public void setDataBag(int idx, DataBag val) throws ExecException;
    public void setMap(int idx, Map<String,Object> val) throws ExecException;
    public void setDateTime(int idx, DateTime val) throws ExecException;

    public int getInt(int idx) throws ExecException, FieldIsNullException;
    public float getFloat(int idx) throws ExecException, FieldIsNullException;
    public double getDouble(int idx) throws ExecException, FieldIsNullException;
    public long getLong(int idx) throws ExecException, FieldIsNullException;
    public String getString(int idx) throws ExecException, FieldIsNullException;
    public boolean getBoolean(int idx) throws ExecException, FieldIsNullException;
    public BigInteger getBigInteger(int idx) throws ExecException;
    public BigDecimal getBigDecimal(int idx) throws ExecException;
    public byte[] getBytes(int idx) throws ExecException, FieldIsNullException;
    public Tuple getTuple(int idx) throws ExecException;
    public DataBag getDataBag(int idx) throws ExecException, FieldIsNullException;
    public Map<String,Object> getMap(int idx) throws ExecException, FieldIsNullException;
    public DateTime getDateTime(int idx) throws ExecException, FieldIsNullException;

    public Schema getSchema();
}

内容就是set、get了一堆数据类型

BinSedesTuple

一些注释

/**
* 这个元组有一个更快的（反）序列化机制。它用于存储 Map 和 Reduce 之间以及 MR 作业之间的中间数据。
* 这仅供内部猪使用。序列化格式可以更改，因此不要使用它来存储任何持久数据（即在加载存储函数中）。
*/

继承关系（终于不是抽象类了）

public class BinSedesTuple extends DefaultTuple

构造函数很简单粗暴，全部super()

BinSedesTuple() {
   super();
}

/**
* 构造一个具有已知字段数的元组。包级别，以便调用者不能直接调用它。
* @param size Number of fields to allocate in the tuple.
*/
BinSedesTuple(int size) {
    super(size);
}

/**
* 从现有的对象列表构造一个元组。包级别，以便调用者不能直接调用它。
* @param c List of objects to turn into a tuple.
*/
BinSedesTuple(List<Object> c) {
    super(c);
}

唯一和父类有区别的地方就是下面这个结构了

private static final InterSedes sedes = InterSedesFactory.getInterSedesInstance();

public static Class<? extends TupleRawComparator> getComparatorClass() {
return InterSedesFactory.getInterSedesInstance().getTupleRawComparatorClass();
}

可以理解为就是普通tuple+interSede

最后给出UML

NonWritableTuple

一些注释

/**
 * A singleton Tuple type which is not picked up for writing by PigRecordWriter
 * 未被PigRecordWriter选取用于写入的单例元组类型
 */

继承关系

public class NonWritableTuple extends AbstractTuple {

值得一提的是DefaultTuple也继承于AbstractTuple

UML图

构造函数直接是空的！（真未写入的Tuple）显然这是个处理过程中间用的Tuple，用完就丢的那种

TargetedTuple

一些注释

/**
* A tuple composed with the operators to which
* it needs be attached
*由需要附加的运算符组成的元组
*/

继承关系

public class TargetedTuple extends AbstractTuple

UML

从描述来看，此Tuple有一个存储运算符的数据结构，即tagetOps，推测是类似计算机计算算术的时候采用的方法，把数字和运算符放进一个栈结构中，然后一个个取出来处理

TimestampedTuple

继承关系

public class TimestampedTuple extends DefaultTuple {

UML图

有两个重要的成员timestamp（时间戳）、heartbeat ，由于没有注释不知道具体是做什么用的

protected double timestamp = 0; // 这个元组的时间戳
protected boolean heartbeat = false; // 如果这是一个心跳，则为真（即目的只是传达新的时间戳；不携带数据）

构造函数

public TimestampedTuple(int numFields) {
        super(numFields);
    }

    public TimestampedTuple(String textLine, String delimiter, int timestampColumn,
                            SimpleDateFormat dateFormat){
        if (delimiter == null) {
            delimiter = defaultDelimiter;
        }
        String[] splitString = textLine.split(delimiter, -1);
        mFields = new ArrayList<Object>(splitString.length-1);
        for (int i = 0; i < splitString.length; i++) {
            if (i==timestampColumn){
                try{
                    timestamp = dateFormat.parse(splitString[i]).getTime()/1000.0;
                }catch(ParseException e){
                    log.error("Could not parse timestamp " + splitString[i]);
                }
            }else{
                mFields.add(splitString[i]);
            }
        }
    }