How to accelerate the correlation caculation


(Joe) #1

Hi,

I am new to Math.Net. It’s really a fantastic tool.
Could you give me any suggestions about this situation? This takes around 20 second to get the result.
signal size: 20000
pattern size: 10000
I am trying to find the max correlation and its shift for the pattern in the signal. The approach is as following:

    private KeyValuePair<int, double> findMaxCorrelation(List<double> pattern, List<double> signal)
    {
        int MaxShift = 0;
        double MaxCorrelation = .0;

        int patternLength = pattern.Count;

        int signalLength = signal.Count;

        int compensatedSignalLength = signalLength + 2 * patternLength;

        // signal                   111111
        // patern                   0000
        // compensatedsignal        00001111110000

        // add size of the signal to ensure the signal can cover the whole range of the pattern
        List<double> compensatedSignal = new List<double>(signal);
        compensatedSignal.InsertRange(0, Enumerable.Repeat(0.0, patternLength));
        compensatedSignal.AddRange(Enumerable.Repeat(0.0, patternLength));

        //
        for (int i = 0; i < signalLength + patternLength + 1; i++)
        {
            double correlation = MathNet.Numerics.Statistics.Correlation.Pearson(pattern, compensatedSignal.Where((v, idx) => (idx >= i && idx < i + patternLength)));
            if (correlation > MaxCorrelation)
            {
                MaxCorrelation = correlation;
                MaxShift = i - patternLength;
            } 
        }

        return new KeyValuePair<int, double>(MaxShift, MaxCorrelation);
    }

(Peter Vanderwaart) #2

I don’t completely understand what you are trying to do, but I wonder whether you need to process the entire file. For example, if the signal+pattern totals 100 units, then it appears 10 times in a 1000 unit file, more than enough for you to identify it.


(Peter Vanderwaart) #3

I copied findMaxCorrelation to my computer and did some experimentation. Correlation.Pearson takes approx 1 msec to run with your signal and pattern lengths and you are doing it 30,000 times. I don’t think you are going to reduce the time much unless your just do less calculation.

I experimented with using every other data point. That cut the time by about 75%, For a signal that was not too noisy, the max correlation that was returned was from a different part of the signal, but the correlation value was the same to several decimal places. My signal was sine plus noise, and you may get different results with a different sort of signal. When the data was noisy, the smaller sample found the same shift as the original.

At any rate, I suggest you try down-sampling by using every second, or every third point in the signal, and compare with the original results to see if it works for you. And, if the smaller dataset returns a different shift with the same correlation, does it matter?


(Joe) #4

Hi Peter,
Thanks a lot for your suggestion.
Down sampling is the key word. And now I decrease the calculation from around 20s to 20ms. I can use this code in my project now. Following the modified method which calculate he max cross correlation at 100, 10 to narrow down the suspect range.

    public static KeyValuePair<int, double> FindMaxCrossCorrelationDeep(List<double> pattern, List<double> signal)
    {
        int MaxShift = 0;
        double MaxCorrelation = 0.0;

        int patternLength = pattern.Count;

        int signalLength = signal.Count;

        int gapLength = signal.Count - patternLength;

        double[] adjustedPattern;
        double[] adjustedSignal;

        if (gapLength > 0)
        {
            adjustedPattern = new double[signalLength];

            Array.Copy(pattern.ToArray(), adjustedPattern, patternLength);

            for (int i = patternLength; i < signalLength; i++)
            {
                adjustedPattern[i] = 0.0;
            }

            adjustedSignal = signal.ToArray();
        }
        else if (gapLength < 0)
        {
            adjustedSignal = new double[patternLength];

            Array.Copy(signal.ToArray(), adjustedSignal, signalLength);

            for (int i = patternLength; i < signalLength; i++)
            {
                adjustedSignal[i] = 0.0;
            }

            adjustedPattern = pattern.ToArray();
        }
        else
        {
            adjustedPattern = pattern.ToArray();
            adjustedSignal = signal.ToArray();
        }

        // 100
        var shift100 = FindMaxCrossCorrelation(
            adjustedPattern.Where((x, i) => (i % 100 == 0)).ToArray(),
            adjustedSignal.Where((x, i) => (i % 100 == 0)).ToArray(),
            0,
            0
            );

        MaxShift = shift100.Key * 100;

        // 10
        var shift10 = FindMaxCrossCorrelation(
            adjustedPattern.Where((x, i) => (i % 10 == 0)).ToArray(),
            adjustedSignal.Where((x, i) => (i % 10 == 0)).ToArray(),
            MaxShift / 10,
            100
            );

        MaxShift = MaxShift + shift10.Key * 10;

        var shift1 = FindMaxCrossCorrelation(
            adjustedPattern.ToArray(),
            adjustedSignal.ToArray(),
            MaxShift,
            10
            );

        MaxShift += shift1.Key;
        MaxCorrelation = shift1.Value;

        return new KeyValuePair<int, double>(MaxShift, MaxCorrelation);

    }

(Tobias Glaubach) #5

Hi,
the Correlation.Pearson Method actually uses a brute force calculation approach by direct summation. Its easy to implement, manage and understandable but actually pretty slow. Especially for huge data sets.

A faster implementation would be to use a FFT based implementation. I was thinking about that before but hadn’t had the time to implement something yet.


(Peter Vanderwaart) #6

The FFT version should be something like this:

 public double[] CorrFFT(double[] x, double[] u)
{
    //create imaginary parts of x, y
    double[] y = new double[x.Length];
    double[] v = new double[y.Length];

    // compute fft in place
    Fourier.Forward(x, y);
    Fourier.Forward(u, v);

    // compute product of x,y with u,v conjugate
    double[] p = new double[x.Length];
    double[] q = new double[x.Length];

    for (int i = 0; i < x.Length; i++)
    {
        p[i] = x[i] * u[i] + y[i] * v[i];
        q[i] = y[i] * u[i] - x[i] * v[i];
    }

    // compute reverse fft in place
    Fourier.Inverse(p, q);

    // return real part. imaginary part should be zeros (or almost)
    return p;
}

Warning! I have not tested this extensively. It seems to work based on one trial. I used this for a source:
http://www.aip.de/groups/soe/local/numres/bookfpdf/f13-2.pdf

Edit: as shown, the program returns different results than the Pearson Method, but maybe they are only different by a constant.