# How to accelerate the correlation caculation

(Joe) #1

Hi,

I am new to Math.Net. It’s really a fantastic tool.
Could you give me any suggestions about this situation? This takes around 20 second to get the result.
signal size: 20000
pattern size: 10000
I am trying to find the max correlation and its shift for the pattern in the signal. The approach is as following:

``````    private KeyValuePair<int, double> findMaxCorrelation(List<double> pattern, List<double> signal)
{
int MaxShift = 0;
double MaxCorrelation = .0;

int patternLength = pattern.Count;

int signalLength = signal.Count;

int compensatedSignalLength = signalLength + 2 * patternLength;

// signal                   111111
// patern                   0000
// compensatedsignal        00001111110000

// add size of the signal to ensure the signal can cover the whole range of the pattern
List<double> compensatedSignal = new List<double>(signal);
compensatedSignal.InsertRange(0, Enumerable.Repeat(0.0, patternLength));

//
for (int i = 0; i < signalLength + patternLength + 1; i++)
{
double correlation = MathNet.Numerics.Statistics.Correlation.Pearson(pattern, compensatedSignal.Where((v, idx) => (idx >= i && idx < i + patternLength)));
if (correlation > MaxCorrelation)
{
MaxCorrelation = correlation;
MaxShift = i - patternLength;
}
}

return new KeyValuePair<int, double>(MaxShift, MaxCorrelation);
}``````

(Peter Vanderwaart) #2

I don’t completely understand what you are trying to do, but I wonder whether you need to process the entire file. For example, if the signal+pattern totals 100 units, then it appears 10 times in a 1000 unit file, more than enough for you to identify it.

(Peter Vanderwaart) #3

I copied findMaxCorrelation to my computer and did some experimentation. Correlation.Pearson takes approx 1 msec to run with your signal and pattern lengths and you are doing it 30,000 times. I don’t think you are going to reduce the time much unless your just do less calculation.

I experimented with using every other data point. That cut the time by about 75%, For a signal that was not too noisy, the max correlation that was returned was from a different part of the signal, but the correlation value was the same to several decimal places. My signal was sine plus noise, and you may get different results with a different sort of signal. When the data was noisy, the smaller sample found the same shift as the original.

At any rate, I suggest you try down-sampling by using every second, or every third point in the signal, and compare with the original results to see if it works for you. And, if the smaller dataset returns a different shift with the same correlation, does it matter?

(Joe) #4

Hi Peter,
Thanks a lot for your suggestion.
Down sampling is the key word. And now I decrease the calculation from around 20s to 20ms. I can use this code in my project now. Following the modified method which calculate he max cross correlation at 100, 10 to narrow down the suspect range.

``````    public static KeyValuePair<int, double> FindMaxCrossCorrelationDeep(List<double> pattern, List<double> signal)
{
int MaxShift = 0;
double MaxCorrelation = 0.0;

int patternLength = pattern.Count;

int signalLength = signal.Count;

int gapLength = signal.Count - patternLength;

if (gapLength > 0)
{

for (int i = patternLength; i < signalLength; i++)
{
}

}
else if (gapLength < 0)
{

for (int i = patternLength; i < signalLength; i++)
{
}

}
else
{
}

// 100
var shift100 = FindMaxCrossCorrelation(
adjustedPattern.Where((x, i) => (i % 100 == 0)).ToArray(),
adjustedSignal.Where((x, i) => (i % 100 == 0)).ToArray(),
0,
0
);

MaxShift = shift100.Key * 100;

// 10
var shift10 = FindMaxCrossCorrelation(
adjustedPattern.Where((x, i) => (i % 10 == 0)).ToArray(),
adjustedSignal.Where((x, i) => (i % 10 == 0)).ToArray(),
MaxShift / 10,
100
);

MaxShift = MaxShift + shift10.Key * 10;

var shift1 = FindMaxCrossCorrelation(
MaxShift,
10
);

MaxShift += shift1.Key;
MaxCorrelation = shift1.Value;

return new KeyValuePair<int, double>(MaxShift, MaxCorrelation);

}``````

(Tobias Glaubach) #5

Hi,
the Correlation.Pearson Method actually uses a brute force calculation approach by direct summation. Its easy to implement, manage and understandable but actually pretty slow. Especially for huge data sets.

A faster implementation would be to use a FFT based implementation. I was thinking about that before but hadn’t had the time to implement something yet.

(Peter Vanderwaart) #6

The FFT version should be something like this:

`````` public double[] CorrFFT(double[] x, double[] u)
{
//create imaginary parts of x, y
double[] y = new double[x.Length];
double[] v = new double[y.Length];

// compute fft in place
Fourier.Forward(x, y);
Fourier.Forward(u, v);

// compute product of x,y with u,v conjugate
double[] p = new double[x.Length];
double[] q = new double[x.Length];

for (int i = 0; i < x.Length; i++)
{
p[i] = x[i] * u[i] + y[i] * v[i];
q[i] = y[i] * u[i] - x[i] * v[i];
}

// compute reverse fft in place
Fourier.Inverse(p, q);

// return real part. imaginary part should be zeros (or almost)
return p;
}
``````

Warning! I have not tested this extensively. It seems to work based on one trial. I used this for a source:
http://www.aip.de/groups/soe/local/numres/bookfpdf/f13-2.pdf

Edit: as shown, the program returns different results than the Pearson Method, but maybe they are only different by a constant.

(Peter Vanderwaart) #7

I’ve looked into this a bit more.

1. Signal processing (which is where you find the FFT version) is different from Statistics (which is where you find the Pearson statistics. I have not found any way to reliably go from the FFT correlation to the Pearson statistic. (The formulas I’ve found for normalizing the FFT correlation don’t do it for my examples.) I think you can use the FFT version to find the set with best correlation, but then you would have to compute the Pearson statistic for the result set using the conventional formula.

2. In checking my work, I discovered that the Excel FFT gives different results from the Numerics FFT. The Excel result values are 8.0 times larger for my data set with N=64. The resulting correlations are also different although they have similar characteristics.