sample : 采样
采样变换根据给定的随机种子,从RDD中随机地按指定比例选一部分记录,创建新的RDD。采样变换 在机器学习中可用于进行交叉验证。
def sample(withReplacement: Boolean, fraction: Double, seed: Long = Utils.random.nextLong): RDD[T]
- withReplacement : Boolean , True表示进行替换采样,False表示进行非替换采样
- fraction : Double, 在0~1之间的一个浮点值,表示要采样的记录在全体记录中的比例
- seed :随机种子
res13: Array[String] = Array((1,2), (1,2), (1,2), (1,2), (1,2), (1,2), (7,8), (8,9), (9,10), (13,14), (11,12), (13,14), (15,16))
scala> import java.util.Random
import java.util.Random
scala> val seed = txt.takeSample(false,2,new Random().nextLong())
seed: Array[String] = Array((1,2), (1,2))
scala> val seed1 = txt.takeSample(false,2,new Random().nextLong())
seed1: Array[String] = Array((7,8), (1,2))
scala> val seed2 = txt.takeSample(false,2,new Random().nextLong())
seed2: Array[String] = Array((1,2), (1,2))
scala> val seed3 = txt.takeSample(false,2,new Random().nextLong())
seed3: Array[String] = Array((1,2), (13,14))