psHashPartitioned

PartitioningScheme

PartitioningSchema: 1


2016-12-18 17:32:18,069 [0x7fe700e0f700] [DEBUG]: pullRedistribute: PullSG started with partitioning schema = 1, destInstanceId = 18446744073709551615



/**

 * @file ArrayDistribution.h
 *
 * This file contains code needed by by several components: the Operators,
 * the Arrays and Storage, the query compiler and optimization, and probably others.
 *
 * It was factored from Operator.cpp to support some generalization of PartitioningSchema,
 * it is not new code.
 *
 * Note that changes to this file could potentially affect array storage formats
 * depending on whether any of this information is stored into those formats.
 * Currently that is true of the PartitioniningSchema enumeration itself, which
 * is currently being left in Metadata.h
 *
 * A word about SciDB data distribution [originally written for the uniq() operator
 * by Alex, but generally applicable.]
 * <br>
 * <br>
 * The default distribution scheme that SciDB uses is called "psHashPartitioned". In reality, it is a hash of the chunk
 * coordinates, modulo the number of instances. In the one-dimensional case, if data starts at 1 with a chunk size
 * of 10 on 3 instances, then chunk 1 goes to instance 0,  chunk 11 to instance 1, chunk 21 to instance 2, chunk 31 to
 * instance 0, and on...

 * <br>
 * <br>
 * In the two-plus dimensional case, the hash is not so easy to describe. For the exact definition, read
 * getInstanceForChunk() in Operator.cpp.
 * <br>
 * <br>
 * All data is currently stored with this distribution. But operators emit data in different distributions quite often.
 * For example, ops like cross, cross_join and some linear algebra routines will output data in a completely different
 * distribution. Worse, ops like slice, subarray, repart may emit "partially filled" or "ragged" chunks - just like
 * we do in the algorithm example above.
 * <br>
 * <br>
 * Data whose distribution is so "violated" must be redistributed before it is stored or processed by other ops that
 * need a particular distribution. The function redistribute() is available and is sometimes called directly by the
 * operator (see PhysicalIndexLookup.cpp for example). Other times, the operator simply tells the SciDB optimizer that
 * it requires input data in a particular distribution or outputs data in a particular distribtuion.
 * The optimizer then inserts the appropriate SG() operators.
 * That approach is more advanatageous, as the optimizer is liable to get smarter about delaying or waiving the
 * call to redistribute(). For this purpose, the functions
 * <br> getOutputDistribution(),
 * <br> changedDistribution() and
 * <br> outputFullChunks()
 * are provided. See their use in the Operator class.
 */


 /**
 * @param ps the kind of PartitioningScheme
 * @param chunkPosition the Coordinates of the chunk in the Array
 * @param dims the Dimensions of the Array
 * @param nInstancesOriginal number of instances at the time the Array was created
 *        (which is not always the current number of instances, even though it often
 *         is, so be careful: use the value that was persisted with the Array).
 * @return if positive, the InstanceID where the primary copy of the chunk belongs
 *         if negative, a special value like ALL_INSTANCES_MASK which is not associated
 *         with a single instance, so it is often incorrect to directly compare the result
 *         to another InstanceID directly.  A matching predicate should be used.
 */
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值