tensorflow之tensor

wyg_031113

已于 2022-04-30 15:46:43 修改

阅读量2.3k

点赞数 1

分类专栏： tensorflow 文章标签： tensorflow 人工智能 python

于 2022-04-30 14:23:38 首次发布

本文链接：https://blog.csdn.net/wyg_031113/article/details/124511745

版权

tensorflow 专栏收录该内容

22 篇文章 2 订阅

订阅专栏

Tensor

tensorflow,从名字上看由tensor+flow组成。本文来看看Tensor是什么，是怎么实现的。

tensorflow里的tensor可以抽象的认为由<n维数组，数组元素类型(dtype)，数组各维大小(shape)>三元组成，同时在这三元组上有一些操作：创建，删除，复制，改变shape, 切片等等。如果用C++来简单定义：

struct Tensor {
    std::vector<int> shape; //表示多维数组各维大小，如三维数组：shape={2,3,4}
    int dtype;              //表示数据类型，根据类型能报data转成对应的数组
    void *data;             //连续内存空间，保存了数组中所有元素
};

shape是可以修改的，比如一个2x3的数组，也可以变成3x2,只要元素个数不变就行。

data是一段连续的内存空间，正如c++中的数组 T[2][3]. 如果dtype是整数，那么就是int data[2][3],

data是个指针，如果强制转成int *data. 那么data, data+1, data+2, ..., data+5就是各个元素。

还能切片slice：如如把data的第一维拿出来，就是data[1]. 因为是2*3数组， data[0], data[1]都3个元素。slice之后tensor还引用着原tensor的内存。而且通过引用计数保存原tensor内存释放了，slice也是可用的

Tensor实现

然而，在工程实现中，还要考虑data的对齐，如8字节对齐。也要考虑 data的内存分配方式，tensorflow里定义了allocator接口，来实现各种不同的分配方式。考虑到模型参数保存，checkpoint保存等，tensor还得支持序列化，tensorflow使用protobuf来序列化tensor.

Tensor的实现在：

ls tensorflow/core/framework/tensor.*
tensor.cc     tensor.h      tensor.proto

基本操作

Tensor成员

  TensorShape shape_; //形状
  TensorBuffer* buf_; //数据

构造

空构造：不是scalar, shape {0}, NumElements() ==0。
type+shape构造，会分配内存：Tensor(DataType type, const TensorShape& shape); 默认用CPUAllocator
allocator+type+shape构造：Tensor(Allocator* a, DataType type, const TensorShape& shape);
带buffer构建：Tensor(DataType type, const TensorShape& shape, TensorBuffer* buf);
基于常量(scalar)的构建函数，重载了很多 explicit Tensor(float scalar_value)

切片

按第一维切片，但是不复制数据，不能保证对齐IsAligned

Tensor Slice(int64_t dim0_start, int64_t dim0_limit) const;
Tensor SubSlice(int64_t index) const;

序列化

  bool FromProto(const TensorProto& other) TF_MUST_USE_RESULT;
  bool FromProto(Allocator* a, const TensorProto& other) TF_MUST_USE_RESULT;

  /// \brief Fills in `proto` with `*this` tensor's content.
  ///
  /// `AsProtoField()` fills in the repeated field for `proto.dtype()`, while
  /// `AsProtoTensorContent()` encodes the content in `proto.tensor_content()`
  /// in a compact form.
  void AsProtoField(TensorProto* proto) const;
  void AsProtoTensorContent(TensorProto* proto) const;

拷贝

复制构造和移动构造都支持
operator=支持复制和移动

访问

/// Returns the data type.
DataType dtype() const { return shape_.data_type(); }

/// Returns the shape of the tensor.
const TensorShape& shape() const { return shape_; }

/// \brief Convenience accessor for the tensor shape.
///
/// For all shape accessors, see comments for relevant methods of
/// `TensorShape` in `tensor_shape.h`.
int dims() const { return shape().dims(); }

/// Convenience accessor for the tensor shape.
int64_t dim_size(int d) const { return shape().dim_size(d); }

/// Convenience accessor for the tensor shape.
int64_t NumElements() const { return shape().num_elements(); }

size_t AllocatedBytes() const
bool IsAligned() const
bool CopyFrom(const Tensor& other,
              const TensorShape& shape)
Tensor t;
d = t.scalar<float>(); //访问scalar
d = t.vec<float>();    //以一维数组方式访问: d[0]

d = t.matrix<float>(); //以矩阵方式访问: d(2,3)

//单个元素访问
flat = t.flat<float>()
d = flat.data()
for(auto i = 0; i < t.NumElements(); i++) d[i]
  template <typename T>
  typename TTypes<T>::Flat flat() {
    return shaped<T, 1>({NumElements()});
  }

  template <typename T>
  typename TTypes<T>::UnalignedFlat unaligned_flat() {
    return unaligned_shaped<T, 1>({NumElements()});
  }

//用于memcpy
/// REQUIRES: `DataTypeCanUseMemcpy(dtype())`.
StringPiece tensor_data() const;
void* data() const;

Debug信息


  std::string SummarizeValue(int64_t max_entries, bool print_v2 = false) const;
  std::string DebugString(int num_values) const;
  std::string DebugString() const { return DebugString(3); }
  std::string DeviceSafeDebugString() const;
  void FillDescription(TensorDescription* description) const;

Tensor shape type的实现在如下文件中

tensor.h: TensorBuffer来执行级data内存。

$ ls tensorflow/core/framework/tensor* 
tensorflow/core/framework/tensor.cc                 tensorflow/core/framework/tensor_shape.proto    tensorflow/core/framework/tensor_testutil.h
tensorflow/core/framework/tensor.h                  tensorflow/core/framework/tensor_shape_test.cc  tensorflow/core/framework/tensor_testutil_test.cc
tensorflow/core/framework/tensor.proto              tensorflow/core/framework/tensor_slice.cc       tensorflow/core/framework/tensor_types.h
tensorflow/core/framework/tensor_description.proto  tensorflow/core/framework/tensor_slice.h        tensorflow/core/framework/tensor_util.cc
tensorflow/core/framework/tensor_key.h              tensorflow/core/framework/tensor_slice.proto    tensorflow/core/framework/tensor_util.h
tensorflow/core/framework/tensor_reference.h        tensorflow/core/framework/tensor_slice_test.cc  tensorflow/core/framework/tensor_util_test.cc
tensorflow/core/framework/tensor_shape.cc           tensorflow/core/framework/tensor_test.cc
tensorflow/core/framework/tensor_shape.h            tensorflow/core/framework/tensor_testutil.cc

$ ls tensorflow/core/framework/shape* 
tensorflow/core/framework/shape_inference.cc  tensorflow/core/framework/shape_inference_test.cc      tensorflow/core/framework/shape_inference_testutil.h
tensorflow/core/framework/shape_inference.h   tensorflow/core/framework/shape_inference_testutil.cc  tensorflow/core/framework/shape_inference_testutil_test.cc

$ ls tensorflow/core/framework/type* 
tensorflow/core/framework/type_index.h   tensorflow/core/framework/typed_allocator.cc  tensorflow/core/framework/types.cc  tensorflow/core/framework/types.proto
tensorflow/core/framework/type_traits.h  tensorflow/core/framework/typed_allocator.h   tensorflow/core/framework/types.h   tensorflow/core/framework/types_test.cc

Tensor支持的数据类型

定义在tensorflow/core/framework/types.proto中

enum DataType {
  // Not a legal value for DataType.  Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;

  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
  DT_UINT32 = 22;
  DT_UINT64 = 23;
}

序列化tensor.proto


// Protocol buffer representing a tensor.
message TensorProto {
  DataType dtype = 1;

  // Shape of the tensor.  TODO(touts): sort out the 0-rank issues.
  TensorShapeProto tensor_shape = 2;

  // Only one of the representations below is set, one of "tensor_contents" and
  // the "xxx_val" attributes.  We are not using oneof because as oneofs cannot
  // contain repeated fields it would require another extra set of messages.

  // Version number.
  //
  // In version 0, if the "repeated xxx" representations contain only one
  // element, that element is repeated to fill the shape.  This makes it easy
  // to represent a constant Tensor with a single value.
  int32 version_number = 3;

  // Serialized raw tensor content from either Tensor::AsProtoTensorContent or
  // memcpy in tensorflow::grpc::EncodeTensorToByteBuffer. This representation
  // can be used for all tensor types. The purpose of this representation is to
  // reduce serialization overhead during RPC call by avoiding serialization of
  // many repeated small items.
  bytes tensor_content = 4;

  // Type specific representations that make it easy to create tensor protos in
  // all languages.  Only the representation corresponding to "dtype" can
  // be set.  The values hold the flattened representation of the tensor in
  // row major order.

  // DT_HALF, DT_BFLOAT16. Note that since protobuf has no int16 type, we'll
  // have some pointless zero padding for each value here.
  repeated int32 half_val = 13 [packed = true];

  // DT_FLOAT.
  repeated float float_val = 5 [packed = true];

  // DT_DOUBLE.
  repeated double double_val = 6 [packed = true];

  // DT_INT32, DT_INT16, DT_UINT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 7 [packed = true];

  // DT_STRING
  repeated bytes string_val = 8;

  // DT_COMPLEX64. scomplex_val(2*i) and scomplex_val(2*i+1) are real
  // and imaginary parts of i-th single precision complex.
  repeated float scomplex_val = 9 [packed = true];

  // DT_INT64
  repeated int64 int64_val = 10 [packed = true];

  // DT_BOOL
  repeated bool bool_val = 11 [packed = true];

  // DT_COMPLEX128. dcomplex_val(2*i) and dcomplex_val(2*i+1) are real
  // and imaginary parts of i-th double precision complex.
  repeated double dcomplex_val = 12 [packed = true];

  // DT_RESOURCE
  repeated ResourceHandleProto resource_handle_val = 14;

  // DT_VARIANT
  repeated VariantTensorDataProto variant_val = 15;

  // DT_UINT32
  repeated uint32 uint32_val = 16 [packed = true];

  // DT_UINT64
  repeated uint64 uint64_val = 17 [packed = true];
}

// Protocol buffer representing the serialization format of DT_VARIANT tensors.
message VariantTensorDataProto {
  // Name of the type of objects being serialized.
  string type_name = 1;
  // Portions of the object that are not Tensors.
  bytes metadata = 2;
  // Tensors contained within objects being serialized.
  repeated TensorProto tensors = 3;
}

tensor_util.h和tensor_util_test.cc中有使用tensor的样例

提供了如下功能:

tensor深拷贝
slice深拷贝
Concat 连接
Split 分割
ConcatSplitStrings 字符串连接
CreatesStringTensorProto：从文件的protobuf中反序列化出dtype=DT_STRING的tensor
CreatesInt32TensorProto
CreatesInt64TensorProto
CreatesUInt32TensorProto
CreatesUInt64TensorProto
...各种类型都有从文件反序列化
CompressTensorProtoInPlaceTooSmall 各种tensor proto压缩
CompressTensorProtoInPlaceAllEqual
CompressTensorProtoConstantTail
CompressTensorProtoNegatizeZero

Tensor内存分配：Allocator

接口

tensorflow/core/framework/allocator.h


// Allocator is an abstract interface for allocating and deallocating
// device memory.
class Allocator {
 public:
  // Align to 64 byte boundary.
  static constexpr size_t kAllocatorAlignment = 64;

  virtual ~Allocator();

  // Return a string identifying this allocator
  virtual std::string Name() = 0;

  // Return an uninitialized block of memory that is "num_bytes" bytes
  // in size.  The returned pointer is guaranteed to be aligned to a
  // multiple of "alignment" bytes.
  // REQUIRES: "alignment" is a power of 2.
  virtual void* AllocateRaw(size_t alignment, size_t num_bytes) = 0;

  // Return an uninitialized block of memory that is "num_bytes" bytes
  // in size with specified allocation attributes.  The returned pointer is
  // guaranteed to be aligned to a multiple of "alignment" bytes.
  // REQUIRES: "alignment" is a power of 2.
  virtual void* AllocateRaw(size_t alignment, size_t num_bytes,
                            const AllocationAttributes& allocation_attr) {
    // The default behavior is to use the implementation without any allocation
    // attributes.
    return AllocateRaw(alignment, num_bytes);
  }

  // Deallocate a block of memory pointer to by "ptr"
  // REQUIRES: "ptr" was previously returned by a call to AllocateRaw
  virtual void DeallocateRaw(void* ptr) = 0;

  // Returns true if this allocator tracks the sizes of allocations.
  // RequestedSize and AllocatedSize must be overridden if
  // TracksAllocationSizes is overridden to return true.
  virtual bool TracksAllocationSizes() const { return false; }

  // Returns true if this allocator allocates an opaque handle rather than the
  // requested number of bytes.
  //
  // This method returns false for most allocators, but may be used by
  // special-case allocators that track tensor usage. If this method returns
  // true, AllocateRaw() should be invoked for all values of `num_bytes`,
  // including 0.
  //
  // NOTE: It is the caller's responsibility to track whether an allocated
  // object is a buffer or an opaque handle. In particular, when this method
  // returns `true`, users of this allocator must not run any constructors or
  // destructors for complex objects, since there is no backing store for the
  // tensor in which to place their outputs.
  virtual bool AllocatesOpaqueHandle() const { return false; }

  // Returns the user-requested size of the data allocated at
  // 'ptr'.  Note that the actual buffer allocated might be larger
  // than requested, but this function returns the size requested by
  // the user.
  //
  // REQUIRES: TracksAllocationSizes() is true.
  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.
  virtual size_t RequestedSize(const void* ptr) const {
    CHECK(false) << "allocator doesn't track sizes";
    return size_t(0);
  }

  // Returns the allocated size of the buffer at 'ptr' if known,
  // otherwise returns RequestedSize(ptr). AllocatedSize(ptr) is
  // guaranteed to be >= RequestedSize(ptr).
  //
  // REQUIRES: TracksAllocationSizes() is true.
  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.
  virtual size_t AllocatedSize(const void* ptr) const {
    return RequestedSize(ptr);
  }

  // Returns either 0 or an identifier assigned to the buffer at 'ptr'
  // when the buffer was returned by AllocateRaw. If non-zero, the
  // identifier differs from every other ID assigned by this
  // allocator.
  //
  // REQUIRES: TracksAllocationSizes() is true.
  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.
  virtual int64_t AllocationId(const void* ptr) const { return 0; }

  // Returns the allocated size of the buffer at 'ptr' if known,
  // otherwise returns 0. This method can be called when
  // TracksAllocationSizes() is false, but can be extremely slow.
  //
  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
  // allocated by this allocator.
  virtual size_t AllocatedSizeSlow(const void* ptr) const {
    if (TracksAllocationSizes()) {
      return AllocatedSize(ptr);
    }
    return 0;
  }


  virtual absl::optional<AllocatorStats> GetStats() { return absl::nullopt; }

  virtual bool ClearStats() TF_MUST_USE_RESULT { return false; }

  virtual void SetSafeFrontier(uint64 count) {}



  // Returns the type of the memory allocated by this allocator.
  virtual AllocatorMemoryType GetMemoryType() const {
    return AllocatorMemoryType::kUnknown;
  }
};

可以继承并实现自己的allocator

CPUAllocator

tensorflow/core/framework/cpu_allocator_impl.h


class CPUAllocator : public Allocator {
 public:
  CPUAllocator()
      : single_allocation_warning_count_(0),
        total_allocation_warning_count_(0) {}

  ~CPUAllocator() override {}

  string Name() override { return "cpu"; }

  void* AllocateRaw(size_t alignment, size_t num_bytes) override {
    if (num_bytes > static_cast<size_t>(LargeAllocationWarningBytes()) &&
        single_allocation_warning_count_ < kMaxSingleAllocationWarnings) {
      ++single_allocation_warning_count_;
      LOG(WARNING) << "Allocation of " << num_bytes << " exceeds "
                   << 100 * kLargeAllocationWarningThreshold
                   << "% of free system memory.";
    }

    void* p = port::AlignedMalloc(num_bytes, alignment);
    if (cpu_allocator_collect_stats) {
      const std::size_t alloc_size = port::MallocExtension_GetAllocatedSize(p);
      mutex_lock l(mu_);
      ++stats_.num_allocs;
      stats_.bytes_in_use += alloc_size;
      stats_.peak_bytes_in_use =
          std::max<int64_t>(stats_.peak_bytes_in_use, stats_.bytes_in_use);
      stats_.largest_alloc_size =
          std::max<int64_t>(stats_.largest_alloc_size, alloc_size);

      if (stats_.bytes_in_use > TotalAllocationWarningBytes() &&
          total_allocation_warning_count_ < kMaxTotalAllocationWarnings) {
        ++total_allocation_warning_count_;
        LOG(WARNING) << "Total allocated memory " << stats_.bytes_in_use
                     << "exceeds " << 100 * kTotalAllocationWarningThreshold
                     << "% of free system memory";
      }
      if (p != nullptr) {
        AddTraceMe("MemoryAllocation", p, num_bytes, alloc_size);
      }
    }
    return p;
  }

  void DeallocateRaw(void* ptr) override {
    if (cpu_allocator_collect_stats) {
      const std::size_t alloc_size =
          port::MallocExtension_GetAllocatedSize(ptr);
      mutex_lock l(mu_);
      stats_.bytes_in_use -= alloc_size;
      AddTraceMe("MemoryDeallocation", ptr, 0, alloc_size);
    }
    port::AlignedFree(ptr);
  }

  void AddTraceMe(absl::string_view traceme_name, const void* chunk_ptr,
                  std::size_t req_bytes, std::size_t alloc_bytes) {
    tensorflow::profiler::TraceMe::InstantActivity(
        [this, traceme_name, chunk_ptr, req_bytes,
         alloc_bytes]() TF_NO_THREAD_SAFETY_ANALYSIS {
          const auto& annotation =
              profiler::ScopedMemoryDebugAnnotation::CurrentAnnotation();
          return tensorflow::profiler::TraceMeEncode(
              traceme_name, {{"allocator_name", Name()},
                             {"bytes_reserved", stats_.bytes_reserved},
                             {"bytes_allocated", stats_.bytes_in_use},
                             {"peak_bytes_in_use", stats_.peak_bytes_in_use},
                             {"requested_bytes", req_bytes},
                             {"allocation_bytes", alloc_bytes},
                             {"addr", reinterpret_cast<uint64>(chunk_ptr)},
                             {"tf_op", annotation.pending_op_name},
                             {"id", annotation.pending_step_id},
                             {"region_type", annotation.pending_region_type},
                             {"data_type", annotation.pending_data_type},
                             {"shape", annotation.pending_shape_func()}});
        },
        /*level=*/profiler::TraceMeLevel::kInfo);
  }

  absl::optional<AllocatorStats> GetStats() override {
    if (!cpu_allocator_collect_stats) return absl::nullopt;
    mutex_lock l(mu_);
    return stats_;
  }

  bool ClearStats() override {
    if (!cpu_allocator_collect_stats) return false;
    mutex_lock l(mu_);
    stats_.num_allocs = 0;
    stats_.peak_bytes_in_use = stats_.bytes_in_use;
    stats_.largest_alloc_size = 0;
    return true;
  }

  size_t AllocatedSizeSlow(const void* ptr) const override {
    return port::MallocExtension_GetAllocatedSize(ptr);
  }

  AllocatorMemoryType GetMemoryType() const override {
    return AllocatorMemoryType::kHostPageable;
  }

 private:
  mutex mu_;
  AllocatorStats stats_ TF_GUARDED_BY(mu_);

  // Use <atomic> for single allocations to avoid mutex contention when
  // statistics are disabled.
  std::atomic<int> single_allocation_warning_count_;
  int total_allocation_warning_count_ TF_GUARDED_BY(mu_);

  TF_DISALLOW_COPY_AND_ASSIGN(CPUAllocator);
};

//注册cpu allocator
REGISTER_MEM_ALLOCATOR("DefaultCPUAllocator", 100, CPUAllocatorFactory);

Allocator注册

tensorflow/core/framework/allocator_registry.h


class AllocatorFactoryRegistry {
 public:
  AllocatorFactoryRegistry() {}
  ~AllocatorFactoryRegistry() {}

  void Register(const char* source_file, int source_line, const string& name,
                int priority, AllocatorFactory* factory);

  // Returns 'best fit' Allocator.  Find the factory with the highest priority
  // and return an allocator constructed by it.  If multiple factories have
  // been registered with the same priority, picks one by unspecified criteria.
  Allocator* GetAllocator();

  // Returns 'best fit' SubAllocator.  First look for the highest priority
  // factory that is NUMA-enabled.  If none is registered, fall back to the
  // highest priority non-NUMA-enabled factory.  If NUMA-enabled, return a
  // SubAllocator specific to numa_node, otherwise return a NUMA-insensitive
  // SubAllocator.
  SubAllocator* GetSubAllocator(int numa_node);

  // Returns the singleton value.
  static AllocatorFactoryRegistry* singleton();

  ProcessStateInterface* process_state() const { return process_state_; }

 protected:
  friend class ProcessState;
  ProcessStateInterface* process_state_ = nullptr;

 private:
  mutex mu_;
  bool first_alloc_made_ = false;
  struct FactoryEntry {
    const char* source_file;
    int source_line;
    string name;
    int priority;
    std::unique_ptr<AllocatorFactory> factory;
    std::unique_ptr<Allocator> allocator;
    // Index 0 corresponds to kNUMANoAffinity, other indices are (numa_node +
    // 1).
    std::vector<std::unique_ptr<SubAllocator>> sub_allocators;
  };
  std::vector<FactoryEntry> factories_ TF_GUARDED_BY(mu_);

  // Returns any FactoryEntry registered under 'name' and 'priority',
  // or 'nullptr' if none found.
  const FactoryEntry* FindEntry(const string& name, int priority) const
      TF_EXCLUSIVE_LOCKS_REQUIRED(mu_);

  TF_DISALLOW_COPY_AND_ASSIGN(AllocatorFactoryRegistry);
};