PartitioningScheme
PartitioningSchema: 1
2016-12-18 17:32:18,069 [0x7fe700e0f700] [DEBUG]: pullRedistribute: PullSG started with partitioning schema = 1, destInstanceId = 18446744073709551615
/**
* @file ArrayDistribution.h*
* This file contains code needed by by several components: the Operators,
* the Arrays and Storage, the query compiler and optimization, and probably others.
*
* It was factored from Operator.cpp to support some generalization of PartitioningSchema,
* it is not new code.
*
* Note that changes to this file could potentially affect array storage formats
* depending on whether any of this information is stored into those formats.
* Currently that is true of the PartitioniningSchema enumeration itself, which
* is currently being left in Metadata.h
*
* A word about SciDB data distribution [originally written for the uniq() operator
* by Alex, but generally applicable.]
* <br>
* <br>
* The default distribution scheme that SciDB uses is called "psHashPartitioned". In reality, it is a hash of the chunk
* coordinates, modulo the number of instances. In the one-dimensional case, if data starts at 1 with a chunk size
* of 10 on 3 instances, then chunk 1 goes to instance 0, chunk 11 to instance 1, chunk 21 to instance 2, chunk 31 to
* instance 0, and on...
* <br>
* <br>
* In the two-plus dimensional case, the hash is not so easy to describe. For the exact definition, read
* getInstanceForChunk() in Operator.cpp.
* <br>
* <br>
* All data is currently stored with this distribution. But operators emit data in different distributions quite often.
* For example, ops like cross, cross_join and some linear algebra routines will output data in a completely different
* distribution. Worse, ops like slice, subarray, repart may emit "partially filled" or "ragged" chunks - just like
* we do in the algorithm example above.
* <br>
* <br>
* Data whose distribution is so "violated" must be redistributed before it is stored or processed by other ops that
* need a particular distribution. The function redistribute() is available and is sometimes called directly by the
* operator (see PhysicalIndexLookup.cpp for example). Other times, the operator simply tells the SciDB optimizer that
* it requires input data in a particular distribution or outputs data in a particular distribtuion.
* The optimizer then inserts the appropriate SG() operators.
* That approach is more advanatageous, as the optimizer is liable to get smarter about delaying or waiving the
* call to redistribute(). For this purpose, the functions
* <br> getOutputDistribution(),
* <br> changedDistribution() and
* <br> outputFullChunks()
* are provided. See their use in the Operator class.
*/
/**
* @param ps the kind of PartitioningScheme
* @param chunkPosition the Coordinates of the chunk in the Array
* @param dims the Dimensions of the Array
* @param nInstancesOriginal number of instances at the time the Array was created
* (which is not always the current number of instances, even though it often
* is, so be careful: use the value that was persisted with the Array).
* @return if positive, the InstanceID where the primary copy of the chunk belongs
* if negative, a special value like ALL_INSTANCES_MASK which is not associated
* with a single instance, so it is often incorrect to directly compare the result
* to another InstanceID directly. A matching predicate should be used.
*/