我的目标是测试一个类是否将其中一个属性设置为随机整数值.我在网上找到了一个卡方检验算法,并决定将其投入使用.我对结果感到非常惊讶:我的样本量越大,测试似乎越不可能通过.我应该说我绝不是一个统计专家(我问这个问题可能是不言而喻的)所以我可能在这里弄错了.
测试结果仅变化最终的int SIZE(在UserTest中).每次测试都跑了30次:
SIZE avg results
11 25.4 26, 25, 22, 24, 30
20 25 26, 26, 24, 22, 27
30 24 24, 22, 24, 26, 24
100 19.4 17, 23, 20, 18, 19
200 16.2 15, 18, 18, 15, 15
1000 13.2 13, 13, 14, 13, 13
10000 10 14, 7, 8, 10, 11
虽然在这种情况下我并不是绝对必须拥有真正的随机性,但我仍然对这个问题是什么感到好奇.这本身就是一个错误的算法,我错误地使用它,“使测试更难”的自然结果(统计noob,还记得),还是我在推动Java伪随机生成器的界限?
域类:
public class User
{
public static final int MINIT = 20;
public static final int MAXIT = 50;
private int iterations;
public void setIterations()
{
Random random = new Random();
setIterations(MINIT+random.nextInt(MAXIT-MINIT));
}
private void setIterations(int iterations) {
this.iterations = iterations;
}
}
测试类:
public class UserTest {
private User user = new User();
@Test
public void testRandomNumbers() {
int results = 0;
final int TIMES = 30;
for(int i = 0; i < TIMES; i++)
{
if (randomNumbersRun())
{
results++;
}
}
System.out.println(results);
Assert.assertTrue(results >= TIMES * 80 / 100);
}
private boolean randomNumbersRun()
{
ArrayList list = new ArrayList();
int r = User.MAXIT - User.MINIT;
final int SIZE = 11;
for (int i = 0; i < r*SIZE; i++) {
user.setIterations();
list.add(user.getIterations());
}
return Statistics.isRandom(list, r);
}
}
卡方算法:
/**
* source: http://en.wikibooks.org/wiki/Algorithm_Implementation/Pseudorandom_Numbers/Chi-Square_Test
* changed parameter to ArrayList for generalization
*/
public static boolean isRandom(ArrayList extends Number> randomNums, int r) {
//According to Sedgewick: "This is valid if N is greater than about 10r"
if (randomNums.size() <= 10 * r) {
return false;
}
//PART A: Get frequency of randoms
Map ht = getFrequencies(randomNums);
//PART B: Calculate chi-square - this approach is in Sedgewick
double n_r = (double) randomNums.size() / r;
double chiSquare = 0;
for (int v : ht.values()) {
double f = v - n_r;
chiSquare += f * f;
}
chiSquare /= n_r;
//PART C: According to Swdgewick: "The statistic should be within 2(r)^1/2 of r
//This is valid if N is greater than about 10r"
return Math.abs(chiSquare - r) <= 2 * Math.sqrt(r);
}
/**
* @param nums an array of integers
* @return a Map, key being the number and value its frequency
*/
private static Map getFrequencies(ArrayList extends Number> nums) {
Map freqs = new HashMap();
for (Number x : nums) {
if (freqs.containsKey(x)) {
freqs.put(x, freqs.get(x) + 1);
} else {
freqs.put(x, 1);
}
}
return freqs;
}
}