本文作者:合肥工业大学 管理学院 钱洋 email:1563178220@qq.com 内容可能有不到之处,欢迎交流。
未经本人允许禁止转载。
文章目录
背景
在编写机器学习算法时,经常需要对各类参数进行初始化,例如一些使用变分推断算法的模型。无论是在Java中,还是Python中,随机数生成器使用都非常重要。
在Python中,我们可以使用numpy中的函数产生各类随机数。以下将先介绍numpy产生随机数的方式,之后介绍Java中Math3 如何产生用户想要的随机数。
python中numpy生成随机数
产生一组随机数
import numpy as np
a1 = np.random.sample(3)
print(a1)
程序输出结果为:
[0.894403 0.75327423 0.0598 ]
产生二维随机数
import numpy as np
a2 = np.random.random([3, 5])
print(a2)
程序输出结果为:
[[0.74259942 0.14265614 0.39788471 0.24822603 0.70212864]
[0.24499887 0.10752136 0.87938368 0.66949099 0.60077382]
[0.93464286 0.12540026 0.23024034 0.01755745 0.1791168 ]]
归一化随机数
import numpy as np
a2 = np.random.random([3, 5])
a2 = a2 / a2.sum(1)[:, np.newaxis] # normalize
print(a2)
程序输出结果为:
[[0.20371704 0.13704266 0.07736116 0.07489164 0.50698749]
[0.16472605 0.10869033 0.44036911 0.10437194 0.18184257]
[0.12031567 0.04564694 0.0950007 0.33511749 0.40391919]]
标准正太分布随机数
import numpy as np
c = np.random.normal(size=(3,4))
print(c)
程序输出结果为:
[[-0.3369725 -1.05351817 -0.84444184 0.43715886]
[-0.56812588 0.15303606 0.50248202 0.95384482]
[-0.63582981 0.44559096 -1.91725906 -0.70182715]]
多元正太分布随机数
import numpy as np
V = np.random.multivariate_normal(np.zeros(5), np.identity(5) * (5),size=3)
print(V)
即均值为0,协方差为5的多元正太分布中,产生随机数,输出结果为:
[[ 2.91755746 -1.67030031 -1.0542531 0.13214101 -2.03207468]
[-1.86659205 0.14574427 -4.24525326 -3.91111677 -2.81316827]
[-1.57533411 2.54300223 -0.69052118 -3.19566595 3.21427621]]
Java中math3产生各种随机数
Math3中的选择器有:
即:
CorrelatedRandomVectorGenerator
- CorrelatedRandomVectorGenerator:这个选择器用于从多元正太分布中抽取随机数,其中的方法包括:
通过源码可以看到,CorrelatedRandomVectorGenerator中的构造方法有两个,分别是:
/**
* Builds a correlated random vector generator from its mean
* vector and covariance matrix.
*
* @param mean Expected mean values for all components.
* @param covariance Covariance matrix.
* @param small Diagonal elements threshold under which column are
* considered to be dependent on previous ones and are discarded
* @param generator underlying generator for uncorrelated normalized
* components.
* @throws org.apache.commons.math3.linear.NonPositiveDefiniteMatrixException
* if the covariance matrix is not strictly positive definite.
* @throws DimensionMismatchException if the mean and covariance
* arrays dimensions do not match.
*/
public CorrelatedRandomVectorGenerator(double[] mean,
RealMatrix covariance, double small,
NormalizedRandomGenerator generator) {
int order = covariance.getRowDimension();
if (mean.length != order) {
throw new DimensionMismatchException(mean.length, order);
}
this.mean = mean.clone();
final RectangularCholeskyDecomposition decomposition =
new RectangularCholeskyDecomposition(covariance, small);
root = decomposition.getRootMatrix();
this.generator = generator;
normalized = new double[decomposition.getRank()];
}
从这个构造方法中,可以看到其输入是均值数组,协方差矩阵,一个double类型的值,和实例化的NormalizedRandomGenerator。在后面会介绍这个构造方法的使用。
另外一个构造方法是:
/**
* Builds a null mean random correlated vector generator from its
* covariance matrix.
*
* @param covariance Covariance matrix.
* @param small Diagonal elements threshold under which column are
* considered to be dependent on previous ones and are discarded.
* @param generator Underlying generator for uncorrelated normalized
* components.
* @throws org.apache.commons.math3.linear.NonPositiveDefiniteMatrixException
* if the covariance matrix is not strictly positive definite.
*/
public CorrelatedRandomVectorGenerator(RealMatrix covariance, double small,
NormalizedRandomGenerator generator) {
int order = covariance.getRowDimension();
mean = new double[order];
for (int i = 0; i < order; ++i) {
mean[i] = 0;
}
final RectangularCholeskyDecomposition decomposition =
new RectangularCholeskyDecomposition(covariance, small);
root = decomposition.getRootMatrix();
this.generator = generator;
normalized = new double[decomposition.getRank()];
}
使用多元正太分布,产生一组随机数,需要使用这里面的一个方法:
/** Generate a correlated random vector.
* @return a random vector as an array of double. The returned array
* is created at each call, the caller can do what it wants with it.
*/
public double[] nextVector() {
// generate uncorrelated vector
for (int i = 0; i < normalized.length; ++i) {
normalized[i] = generator.nextNormalizedDouble();
}
// compute correlated vector
double[] correlated = new double[mean.length];
for (int i = 0; i < correlated.length; ++i) {
correlated[i] = mean[i];
for (int j = 0; j < root.getColumnDimension(); ++j) {
correlated[i] += root.getEntry(i, j) * normalized[j];
}
}
return correlated;
}
使用案例
下面以具体的案例讲解如何使用CorrelatedRandomVectorGenerator。
import org.apache.commons.math3.linear.MatrixUtils;
import org.apache.commons.math3.linear.RealMatrix;
import org.apache.commons.math3.random.CorrelatedRandomVectorGenerator;
import org.apache.commons.math3.random.GaussianRandomGenerator;
import org.apache.commons.math3.random.JDKRandomGenerator;
import org.apache.commons.math3.random.RandomGenerator;
public class MultivariateGaussianGeneratorTest2 {
public static void main(String[] args) {
RandomGenerator rg = new JDKRandomGenerator();
rg.setSeed(17399225432l); // 随机种子
GaussianRandomGenerator rawGenerator = new GaussianRandomGenerator(rg);
double[] mean = {1, 2, 5};
double[][] arrA = {{1, 2, 3}, {3, 4, 5}, {4, 5, 6}};
RealMatrix matrixA = MatrixUtils.createRealMatrix(arrA);
//生成协方差矩阵
RealMatrix covariance = matrixA.multiply (matrixA.transpose());
// 调用函数
CorrelatedRandomVectorGenerator generator =
new CorrelatedRandomVectorGenerator(mean, covariance, 1.0e-12 * covariance.getNorm(), rawGenerator);
double[] randomVector = generator.nextVector();
for(double d : randomVector){
System.out.println(d);
}
}
}
如上面程序所示,设置了均值数组,通过矩阵和矩阵的逆相乘得到协方差,以此作为输入产生一组来自多元正太分布的随机数。
上面程序的输出结果为:
Array2DRowRealMatrix{{14.0,26.0,32.0},{26.0,50.0,62.0},{32.0,62.0,77.0}}
8.241122654913196
15.481679575594983
21.601958035935876
GaussianRandomGenerator
这个类,在上面的代码中已有使用。该类用于从标准正太分布中产生一个值。其中,该类的构造方法如下:
/** Create a new generator.
* @param generator underlying random generator to use
*/
public GaussianRandomGenerator(final RandomGenerator generator) {
this.generator = generator;
}
该类中产生,产生一个随机数的方法如下:
/** Generate a random scalar with null mean and unit standard deviation.
* @return a random scalar with null mean and unit standard deviation
*/
public double nextNormalizedDouble() {
return generator.nextGaussian();
}
使用案例
下面以一个案例讲解其使用:
import org.apache.commons.math3.random.GaussianRandomGenerator;
import org.apache.commons.math3.random.JDKRandomGenerator;
import org.apache.commons.math3.random.RandomGenerator;
public class GaussianRandomTest {
public static void main(String[] args) {
RandomGenerator rg = new JDKRandomGenerator();
// rg.setSeed(17399225432l); // 随机种子
GaussianRandomGenerator rawGenerator = new GaussianRandomGenerator(rg);
for (int i = 0; i < 10; i++) {
double g = rawGenerator.nextNormalizedDouble();
System.out.println(g);
}
}
}
执行该程序,会在控制台输出10个随机数,如下所示:
HaltonSequenceGenerator
Halton sequences常用于 Monte Carlo估计中。其产生随机数的原理是以一个质数为基,例如2或者3等,然后开始在0-1之间进行划分。例如:
1⁄2, 1⁄4, 3⁄4, 1⁄8, 5⁄8, 3⁄8, 7⁄8, 1⁄16, 9⁄16,...
1⁄3, 2⁄3, 1⁄9, 4⁄9, 7⁄9, 2⁄9, 5⁄9, 8⁄9, 1⁄27,...
在math3中,HaltonSequenceGenerator类中的构造方法有:
/**
* Construct a new Halton sequence generator for the given space dimension.
*
* @param dimension the space dimension
* @throws OutOfRangeException if the space dimension is outside the allowed range of [1, 40]
*/
public HaltonSequenceGenerator(final int dimension) throws OutOfRangeException {
this(dimension, PRIMES, WEIGHTS);
}
即设置产生随机数的维度。
另外,一个构造方法是:
/**
* Construct a new Halton sequence generator with the given base numbers and weights for each dimension.
* The length of the bases array defines the space dimension and is required to be > 0.
*
* @param dimension the space dimension
* @param bases the base number for each dimension, entries should be (pairwise) prime, may not be null
* @param weights the weights used during scrambling, may be null in which case no scrambling will be performed
* @throws NullArgumentException if base is null
* @throws OutOfRangeException if the space dimension is outside the range [1, len], where
* len refers to the length of the bases array
* @throws DimensionMismatchException if weights is non-null and the length of the input arrays differ
*/
public HaltonSequenceGenerator(final int dimension, final int[] bases, final int[] weights)
throws NullArgumentException, OutOfRangeException, DimensionMismatchException {
MathUtils.checkNotNull(bases);
if (dimension < 1 || dimension > bases.length) {
throw new OutOfRangeException(dimension, 1, PRIMES.length);
}
if (weights != null && weights.length != bases.length) {
throw new DimensionMismatchException(weights.length, bases.length);
}
this.dimension = dimension;
this.base = bases.clone();
this.weight = weights == null ? null : weights.clone();
count = 0;
}
即需要产生数据的维度,以及所使用的基(质素数组)以及权重。
其中,在该类中,默认的基有:
/** The first 40 primes. */
private static final int[] PRIMES = new int[] {
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139,
149, 151, 157, 163, 167, 173
};
默认的权重为:
/** The optimal weights used for scrambling of the first 40 dimension. */
private static final int[] WEIGHTS = new int[] {
1, 2, 3, 3, 8, 11, 12, 14, 7, 18, 12, 13, 17, 18, 29, 14, 18, 43, 41,
44, 40, 30, 47, 65, 71, 28, 40, 60, 79, 89, 56, 50, 52, 61, 108, 56,
66, 63, 60, 66
};
如果需要产生一组随机数,需要调用该类中的两个方法:
/** {@inheritDoc} */
public double[] nextVector() {
final double[] v = new double[dimension];
for (int i = 0; i < dimension; i++) {
int index = count;
double f = 1.0 / base[i];
int j = 0;
while (index > 0) {
final int digit = scramble(i, j, base[i], index % base[i]);
v[i] += f * digit;
index /= base[i]; // floor( index / base )
f /= base[i];
}
}
count++;
return v;
}
/**
* Skip to the i-th point in the Halton sequence.
* <p>
* This operation can be performed in O(1).
*
* @param index the index in the sequence to skip to
* @return the i-th point in the Halton sequence
* @throws NotPositiveException if index < 0
*/
public double[] skipTo(final int index) throws NotPositiveException {
count = index;
return nextVector();
}
使用案例
下面将以具体的案例讲解HaltonSequenceGenerator的使用。
import org.apache.commons.math3.random.HaltonSequenceGenerator;
public class HaltonSequenceTest {
public static void main(String[] args) {
/*****第一种方式产生一组随机数*****/
HaltonSequenceGenerator randomVectorGenerator = new HaltonSequenceGenerator(3);
//设置
randomVectorGenerator.skipTo(999999);
//产生一组随机数
double[] b = randomVectorGenerator.nextVector();
for (int i = 0; i < b.length; i++) {
System.out.println(b[i]);
}
/*****第二种方式产生一组随机数*****/
System.out.println(".......第二种构造方法产生随机数.........");
HaltonSequenceGenerator randomVectorGenerator1 = new HaltonSequenceGenerator(4, new int[] { 3, 5, 7,11, 13 }, null);
//设置
randomVectorGenerator1.skipTo(999999);
//产生一组随机数
double[] b1 = randomVectorGenerator1.nextVector();
for (int i = 0; i < b1.length; i++) {
System.out.println(b1[i]);
}
}
}
执行该程序,输出结果为:
调整skipTo()方法中的数字,可以产生不同的随机数。
JDKRandomGenerator
JDKRandomGenerator类继承了java.util中的Random类,其使用方式较为简单。其构造方法主要有:
/**
* Create a new JDKRandomGenerator with a default seed.
*/
public JDKRandomGenerator() {
super();
}
/**
* Create a new JDKRandomGenerator with the given seed.
*
* @param seed initial seed
* @since 3.6
*/
public JDKRandomGenerator(int seed) {
另外,其还包括两个方法,用于设置随机数种子。
该类可以调用Random类中的next, nextBoolean, nextBytes, nextDouble, nextFloat, nextGaussian, nextInt, nextInt, nextLong, setSeed方法。
使用案例
下面为使用案例:
import org.apache.commons.math3.random.JDKRandomGenerator;
import org.apache.commons.math3.random.RandomGenerator;
public class JDKRandomTest {
public static void main(String[] args) {
RandomGenerator rg = new JDKRandomGenerator();
for (int i = 0; i < 2; i++) {
System.out.println("double:" + rg.nextDouble());
System.out.println("boolean:" + rg.nextBoolean());
System.out.println("float:" + rg.nextFloat());
System.out.println("gaussian:" + rg.nextGaussian());
System.out.println("int:" + rg.nextInt());
System.out.println("long:" + rg.nextLong());
}
}
}
执行该程序,输出结果为:
SobolSequenceGenerator
SobolSequenceGenerator类的构造方法有两种,常使用第一种:
/**
* Construct a new Sobol sequence generator for the given space dimension.
*
* @param dimension the space dimension
* @throws OutOfRangeException if the space dimension is outside the allowed range of [1, 1000]
*/
public SobolSequenceGenerator(final int dimension) throws OutOfRangeException {
if (dimension < 1 || dimension > MAX_DIMENSION) {
throw new OutOfRangeException(dimension, 1, MAX_DIMENSION);
}
// initialize the other dimensions with direction numbers from a resource
final InputStream is = getClass().getResourceAsStream(RESOURCE_NAME);
if (is == null) {
throw new MathInternalError();
}
this.dimension = dimension;
// init data structures
direction = new long[dimension][BITS + 1];
x = new long[dimension];
try {
initFromStream(is);
} catch (IOException e) {
// the internal resource file could not be read -> should not happen
throw new MathInternalError();
} catch (MathParseException e) {
// the internal resource file could not be parsed -> should not happen
throw new MathInternalError();
} finally {
try {
is.close();
} catch (IOException e) { // NOPMD
// ignore
}
}
}
/**
* Construct a new Sobol sequence generator for the given space dimension with
* direction vectors loaded from the given stream.
* <p>
* The expected format is identical to the files available from
* <a href="http://web.maths.unsw.edu.au/~fkuo/sobol/">Stephen Joe and Frances Kuo</a>.
* The first line will be ignored as it is assumed to contain only the column headers.
* The columns are:
* <ul>
* <li>d: the dimension</li>
* <li>s: the degree of the primitive polynomial</li>
* <li>a: the number representing the coefficients</li>
* <li>m: the list of initial direction numbers</li>
* </ul>
* Example:
* <pre>
* d s a m_i
* 2 1 0 1
* 3 2 1 1 3
* </pre>
* <p>
* The input stream <i>must</i> be an ASCII text containing one valid direction vector per line.
*
* @param dimension the space dimension
* @param is the stream to read the direction vectors from
* @throws NotStrictlyPositiveException if the space dimension is < 1
* @throws OutOfRangeException if the space dimension is outside the range [1, max], where
* max refers to the maximum dimension found in the input stream
* @throws MathParseException if the content in the stream could not be parsed successfully
* @throws IOException if an error occurs while reading from the input stream
*/
public SobolSequenceGenerator(final int dimension, final InputStream is)
throws NotStrictlyPositiveException, MathParseException, IOException {
if (dimension < 1) {
throw new NotStrictlyPositiveException(dimension);
}
this.dimension = dimension;
// init data structures
direction = new long[dimension][BITS + 1];
x = new long[dimension];
// initialize the other dimensions with direction numbers from the stream
int lastDimension = initFromStream(is);
if (lastDimension < dimension) {
throw new OutOfRangeException(dimension, 1, lastDimension);
}
}
使用该方法产生的随机数如下图所示:
使用案例
以下为一个使用案例:
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.math3.linear.Array2DRowRealMatrix;
import org.apache.commons.math3.linear.RealMatrix;
import org.apache.commons.math3.random.SobolSequenceGenerator;
public class SobolSequenceTest {
public static void main(String[] args) {
//产生一组随机数---测试案例
SobolSequenceGenerator generator = new SobolSequenceGenerator(5);
generator.skipTo(999999); //这里必须使用,否则产生的全部是0
System.out.println("..............................");
double[] vector = generator.nextVector();
for (int i = 0; i < vector.length; i++) {
System.out.println(vector[i]);
}
System.out.println("...........SobolSequenceGenerator产生一组随机数.......");
//产生多组随机数,并添加到矩阵中
System.out.println(".............生成随机数矩阵...............");
List<RealMatrix> points = new ArrayList<RealMatrix>();
for (double i = 0; i < 3; i++) {
double[] vector1 = generator.nextVector();
RealMatrix pointMatrix = new Array2DRowRealMatrix(vector1);
points.add(pointMatrix);
}
for (int i = 0; i < points.size(); i++) {
System.out.println(points.get(i));
}
}
}
程序的输出结果为:
UniformRandomGenerator
从均匀分布中产生随机数。UniformRandomGenerator类实现了NormalizedRandomGenerator接口。UniformRandomGenerator类的构造方法为:
/** Create a new generator.
* @param generator underlying random generator to use
*/
public UniformRandomGenerator(RandomGenerator generator) {
this.generator = generator;
}
下面为其部分源码:
/** Generate a random scalar with null mean and unit standard deviation.
* <p>The number generated is uniformly distributed between -&sqrt;(3)
* and +&sqrt;(3).</p>
* @return a random scalar with null mean and unit standard deviation
*/
public double nextNormalizedDouble() {
return SQRT3 * (2 * generator.nextDouble() - 1.0);
}
从该源码中可以看到, nextNormalizedDouble()方法产生的随机数在 [-√3, +√3].之间,因为SQRT3 设置为√3,generator.nextDouble()产生的值范围是[0-1]。
使用案例
import org.apache.commons.math3.random.JDKRandomGenerator;
import org.apache.commons.math3.random.RandomGenerator;
import org.apache.commons.math3.random.UniformRandomGenerator;
public class UniformRandomTest {
public static void main(String[] args) {
RandomGenerator rg = new JDKRandomGenerator();
rg.setSeed(10);
UniformRandomGenerator generator = new UniformRandomGenerator(rg);
double[] sample = new double[10];
for (int i = 0; i < sample.length; ++i) {
sample[i] = generator.nextNormalizedDouble();
System.out.println(sample[i]);
}
}
}