Vectorization performance

Hi there,

I’m working with the max drawdown indicator, and what I need is a very fast implementation to run in a machine learning experiment. The implementation will be called many times, however, the data set used in each call is relatively small.

I made a solution to benchmarking the implementations form this question’s answers; the only work I done was vectorized version implementation using Math.NET.

public static class VectorizedMaxDrawDown
    public static double Run(double[] ccr)
        double[] cumCCR = new double[ccr.Length];
        double summedCcr = 0;
        for (int idx = 0; idx < ccr.Length; idx++)
            summedCcr += ccr[idx];
            cumCCR[idx] = summedCcr;
        var cumulativeCcr = Vector<double>.Build.DenseOfArray(cumCCR);
        var invCumulativeCcr = 1 / cumulativeCcr;
        var cumulativeCcrMat = (Vector<double>.OuterProduct(cumulativeCcr, invCumulativeCcr) - 1);
        return cumulativeCcrMat.LowerTriangle().Enumerate().Min();

I expected the vectorized version to be way faster than the others, but the simpler for loop version is two orders of magnitude faster!

So the questions are:

  • Why is the simple for loop version faster?
  • Is my implementation correct?
  • How can be improved? (e.g. a better way to estimate the vector cumulative sum )

Any hint will be much appreciated, thanks in advance.