1、前言:
这里重点讲一下transform函数。
2、正文
直接上源码:
/**
* Binary searching in several buckets to place each data point.
* @param splits array of split points
* @param feature data point
* @param keepInvalid NaN flag.
* Set "true" to make an extra bucket for NaN values;
* Set "false" to report an error for NaN values
* @return bucket for each data point
* @throws SparkException if a feature is < splits.head or > splits.last
*/
private[feature] def binarySearchForBuckets(
splits: Array[Double],
feature: Double,
keepInvalid: Boolean): Double = {
if (feature.isNaN) {
if (keepInvalid) {
splits.length - 1
} else {
throw new SparkException("Bucketizer encountered NaN value. To handle or skip NaNs," +
" try setting Bucketizer.handleInvalid.