Spark 之 unsafeRow

UnsafeRow

UnsafeRow is an InternalRow that is backed by raw memory instead of Java objects.
UnSafeRow has three parts: [null bit set] [values] [variable length portion]

在这里插入图片描述

  • 64bit(8byte)对齐,内存空间不紧凑但有利于提高访存性能
  • 小端存储,这样低位类型存到高位内存(如存int到64位)不需要额外编码
  • 所有列不管什么类型都按64bit存储,变长内容顺延存储
Array in UnsafeRow
/**
 * An Unsafe implementation of Array which is backed by raw memory instead of Java objects.
 *
 * Each array has four parts:
 *   [numElements][null bits][values or offset&length][variable length portion]
 *
 * The `numElements` is 8 bytes storing the number of elements of this array.
 *
 * In the `null bits` region, we store 1 bit per element, represents whether an element is null
 * Its total size is ceil(numElements / 8) bytes, and it is aligned to 8-byte boundaries.
 *
 * In the `values or offset&length` region, we store the content of elements. For fields that hold
 * fixed-length primitive types, such as long, double, or int, we store the value directly
 * in the field. The whole fixed-length portion (even for byte) is aligned to 8-byte boundaries.
 * For fields with non-primitive or variable-length values, we store a relative offset
 * (w.r.t. the base address of the array) that points to the beginning of the variable-length field
 * and length (they are combined into a long). For variable length portion, each is aligned
 * to 8-byte boundaries.
 *
 * Instances of `UnsafeArrayData` act as pointers to row data stored in this format.
 */
UnsafeMapData

/**
 * An Unsafe implementation of Map which is backed by raw memory instead of Java objects.
 *
 * Currently we just use 2 UnsafeArrayData to represent UnsafeMapData, with extra 8 bytes at head
 * to indicate the number of bytes of the unsafe key array.
 * [unsafe key array numBytes] [unsafe key array] [unsafe value array]
 *
 * Note that, user is responsible to guarantee that the key array does not have duplicated
 * elements, otherwise the behavior is undefined.
 */
// TODO: Use a more efficient format which doesn't depend on unsafe array.
public final class UnsafeMapData extends MapData implements Externalizable, KryoSerializable {
从 unsafeRow 判断某一列是否null

C++ 版

bool IsNull(uint8_t* buffer_address, int32_t index) {
 // 对64求余 , 然后置1,移动余数个位数,这样64个bit除了index位,
 // 其他位都为0,方便后面取& 
  int64_t mask = 1L << (index & 0x3f);  // mod 64 and shift
  // >> 6 表示除以64 ,确定 nullbit 在哪个 word里
  int64_t wordOffset = (index >> 6) * 8;
  int64_t word;
  // 将该 word 从buffer中先取出来
  memcpy(&word, buffer_address + wordOffset, sizeof(int64_t));
  int64_t value = (word & mask);
  int64_t thebit = value >> (index & 0x3f);
  if (thebit == 1) {
    return true;
  } else {
    return false;
  }
}
String 8-bytes alignment

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java

 private void writeUnalignedBytes(
      int ordinal,
      Object baseObject,
      long baseOffset,
      int numBytes) {
    final int roundedSize = ByteArrayMethods.roundNumberOfBytesToNearestWord(numBytes);
    grow(roundedSize);
    zeroOutPaddingBytes(numBytes);
    Platform.copyMemory(baseObject, baseOffset, getBuffer(), cursor(), numBytes);
    setOffsetAndSize(ordinal, numBytes);
    // 这里就是为了 8-byte 对齐的
    increaseCursor(roundedSize);
  }
isNullAt
 @Override
  public boolean isNullAt(int ordinal) {
    assertIndexIsValid(ordinal);
    return BitSetMethods.isSet(baseObject, baseOffset, ordinal);
  }

common/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java

/**
   * Returns {@code true} if the bit is set at the specified index.
   */
  public static boolean isSet(Object baseObject, long baseOffset, int index) {
    assert index >= 0 : "index (" + index + ") should >= 0";
    // 对64求余 , 然后置1,移动余数个位数,这样64个bit除了index位,
    // 1L是为了生成mask
    final long mask = 1L << (index & 0x3f);  // mod 64 and shift
    // >> 6 表示除以64 ,确定 nullbit 在哪个 word里
    final long wordOffset = baseOffset + (index >> 6) * WORD_SIZE;
    final long word = Platform.getLong(baseObject, wordOffset);
    // 通过bitwise 去除 if-else branch,妙!
    return (word & mask) != 0;
  }
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值