It's a call to the community to see if anyone has an idea to improve the speed of this MSD calculation implementation. It is largely based on the implementation from this blog post : http://damcb.com/mean-square-disp.html
For now the current implementation takes about 9s for a 2D trajectory of 5 000 points. It's really way too much if you need to compute a lot of trajectories...
I didn't try to parallelize it (with multiprocess or joblib) but I have the feeling that creating new processes will be too heavy for this kind of algorithm.
Here is the code :
import os
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Parameters
N = 5000
max_time = 100
dt = max_time / N
# Generate 2D brownian motion
t = np.linspace(0, max_time, N)
xy = np.cumsum(np.random.choice([-1, 0, 1], size=(N, 2)), axis=0)
traj = pd.DataFrame({'t': t, 'x': xy[:,0], 'y': xy[:,1]})
print(traj.head())
# Draw motion
ax = traj.plot(x='x', y='y', alpha=0.6, legend=False)
# Set limits
ax.set_xlim(traj['x'].min(), traj['x'].max())
ax.set_ylim(traj['y'].min(), traj['y'].max())
And the output :
t x y
0 0.000000 -1 -1
1 0.020004 -1 0
2 0.040008 -1 -1
3 0.060012 -2 -2
4 0.080016 -2 -2
def compute_msd(trajectory, t_step, coords=['x', 'y']):
tau = trajectory['t'].copy()
shifts = np.floor(tau / t_step).astype(np.int)
msds = np.zeros(shifts.size)
msds_std = np.zeros(shifts.size)
for i, shift in enumerate(shifts):
diffs = trajectory[coords] - trajectory[coords].shift(-shift)
sqdist = np.square(diffs).sum(axis=1)
msds[i] = sqdist.mean()
msds_std[i] = sqdist.std()
msds = pd.DataFrame({'msds': msds, 'tau': tau, 'msds_std': msds_std})
return msds
# Compute MSD
msd = compute_msd(traj, t_step=dt, coords=['x', 'y'])
print(msd.head())
# Plot MSD
ax = msd.plot(x="tau", y="msds", logx=True, logy=True, legend=False)
ax.fill_between(msd['tau'], msd['msds'] - msd['msds_std'], msd['msds'] + msd['msds_std'], alpha=0.2)
And the output :
msds msds_std tau
0 0.000000 0.000000 0.000000
1 1.316463 0.668169 0.020004
2 2.607243 2.078604 0.040008
3 3.891935 3.368651 0.060012
4 5.200761 4.685497 0.080016
And some profiling :
%timeit msd = compute_msd(traj, t_step=dt, coords=['x', 'y'])
Give this :
1 loops, best of 3: 8.53 s per loop
Any idea ?
解决方案
The MSD calculations mentioned so far are all O(N**2) where N is the number of time steps. Using the FFT this can be reduced to O(N*log(N)). See this question and answer for an explanation and an implementation in python.
EDIT:
A small benchmark (I also added this benchmark to this answer): Generate a trajectory with
r = np.cumsum(np.random.choice([-1., 0., 1.], size=(N, 3)), axis=0)
For N=100.000, we get
$ %timeit msd_straight_forward(r)
1 loops, best of 3: 2min 1s per loop
$ %timeit msd_fft(r)
10 loops, best of 3: 253 ms per loop