spafdr matlab,Speeding up MATLAB code for FDR estimation

问题

I have 2 input variables:

a vector of p-values (p) with N elements (unsorted)

and N x M matrix with p-values obtained by random permutations (pr) with M iterations. N is quite large, 10K to 100K or more. M let's say 100.

I'm estimating the False Discovery Rate (FDR) for each element of p representing how many p-values from random permutations will pass if the current p-value (from p) will be the threshold.

I wrote the function with ARRAYFUN, but it takes lot of time for large N (2 min for N=20K), comparable to for-loop.

function pfdr = fdr_from_random_permutations(p, pr)

%# ... skipping arguments checks

pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);

Any ideas how to make it faster?

Comments about statistical issues here are also welcome.

The test data can be generated as p = rand(N,1); pr = rand(N,M);.

回答1:

Well, the trick was indeed sorting the vectors. I give credit to @EgonGeerardyn for that. Also, there is no need to use mean. You can just divide everything afterwards by M. When p is sorted, finding the amount of values that are less than current x, is just a running index. pr is a more interesting case - I used a running index called place to discover how many elements are less than x.

Edit(2): Here is the fastest version I come up with:

function Speedup2()

N = 10000/4 ;

M = 100/4 ;

p = rand(N,1); pr = rand(N,M);

tic

pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);

toc

tic

out = zeros(numel(p),1);

[p,sortIndex] = sort(p);

pr = sort(pr(:));

pr(end+1) = Inf;

place = 1;

N = numel(pr);

for i=1:numel(p)

x = p(i);

while pr(place)<=x

place = place+1;

end

exp1a = place-1;

exp2 = i;

out(i) = exp1a/exp2;

end

out(sortIndex) = out/ M;

toc

disp(max(abs(pfdr-out)));

end

And the benchmark results for N = 10000/4 ; M = 100/4 :

Elapsed time is 0.898689 seconds.

Elapsed time is 0.007697 seconds.

2.220446049250313e-016

and for N = 10000 ; M = 100 ;

Elapsed time is 39.730695 seconds.

Elapsed time is 0.088870 seconds.

2.220446049250313e-016

回答2:

First of all, tr to analyze this using the profiler. Profiling should ALWAYS be the first step when trying to improve performance. We can all guess at what is causing your performance drop, but the only way to be sure and focus on the right part is to inspect the profiler report.

I didn't run the profiler on your code, as I don't want to generate test data to do so; but I have some ideas about what work is being carried out in vain. In your function mean(sum(pr<=x))./sum(p<=x), you are repeatedly summing over p<=x. All in all, one call includes N comparisons and N-1 summations. So for both, you have behavior that is quadratic in N when all N values of p are calculated.

If you step through a sorted version of p, you need less calculations and comparisons, as you can keep track of a running sum (i.e. behavior that is linear in N). I guess a similar method could be applied to the other part of the calculation.

edit:

The implementation of my idea as expressed above:

function pfdr = fdr(p,pr)

[N, M] = size(pr);

[p, idxP] = sort(p);

[pr] = sort(pr(:));

pfdr = NaN(N,1);

parfor iP = 1:N

x = p(iP);

m = sum(pr<=x)/M;

pfdr(iP) = m/iP;

end

pfdr(idxP) = pfdr;

If you have access to the parallel computing toolbox, the parfor loop will allow you to gain some performance. I used two basic ideas: mean(sum(pr<=x)) is actually equal to sum(pr(:)<=x)/M. On the other hand, since p is sorted, this allows you to just take the index as the number of elements (in the assumption that every element is unique, otherwise you'll have to work with unique to do the full rigorous analysis).

As you should already know very well by running the profiler yourself, the line m = sum(pr<=x)/M; is the main resource hog. This can be tackled similarly to p by making use of the sorted nature of pr.

I tested my code (both for identical results and for time consumption) against yours. For N=20e3; M=100, I get about 63 seconds to run your code and 43 seconds to run mine on my main computer (MATLAB 2011a on 64 bit Arch Linux, 8 GiB RAM, Core i7 860). For smaller values of M the gain is larger. But this gain is in part due to parallelization.

edit2: Apparently, I came to very similar results as Andrey, my result would have been very similar had I pursued the same approach.

However, I realised that there are some built-in functions that do more or less what you need, i.e. quite similar to determining the empirical cumulative density function. And this can be done by constructing the histogram:

function pfdr = fdr(p,pr)

[N, M] = size(pr);

[p, idxP] = sort(p);

count = histc(pr(:), [0; p]);

count = cumsum(count(1:N));

pfdr = count./(1:N).';

pfdr(idxP) = pfdr/M;

For the same M and N as above, this code takes 228 milliseconds on my computer. It takes 104 milliseconds for Andrey's parameters, so on my computer it turns out a bit slower, but I think this code is far more readable than intricate for loops (as was the case in both our examples).

回答3:

Following the discussion between me and Andrey in this question, this very late answer is just to prove to Andrey that vectorized solutions are still faster than JIT'ed loops, they sometimes just aren't as easy to find.

I am more than willing to remove this answer if it is deemed inappropriate by the OP.

Now, on to business, here's the original arrayfun, looped version by Andrey, and vectorized version by Egon:

function test

clc

N = 10000/4 ;

M = 100/4 ;

p = rand(N,1);

pr = rand(N,M);

%% first option

tic

pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);

toc

%% second option

tic

out = zeros(numel(p),1);

[p2,sortIndex] = sort(p);

pr2 = sort(pr(:));

pr2(end+1) = Inf;

place = 1;

for i=1:numel(p2)

x = p2(i);

while pr2(place)<=x

place = place+1;

end

exp1a = place-1;

exp2 = i;

out(i) = exp1a/exp2;

end

out(sortIndex) = out/ M;

toc

%% third option

tic

[p2,sortIndex] = sort(p);

count = histc(pr2(:), [0; p2]);

count = cumsum(count(1:N));

out = count./(1:N).';

out(sortIndex) = out/M;

toc

end

Results on my laptop:

Elapsed time is 0.916196 seconds.

Elapsed time is 0.011429 seconds.

Elapsed time is 0.007328 seconds.

and for N=1000; M = 100; :

Elapsed time is 38.082718 seconds.

Elapsed time is 0.127052 seconds.

Elapsed time is 0.042686 seconds.

So: vectorized is 2-3 times faster.

来源:https://stackoverflow.com/questions/9008259/speeding-up-matlab-code-for-fdr-estimation

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值