I'm working with the max drawdown indicator, and what I need is a very fast implementation to run in a machine learning experiment. The implementation will be called many times, however, the data set used in each call is relatively small.
I made a solution to benchmarking the implementations form this question's answers; the only work I done was vectorized version implementation using Math.NET.
public static class VectorizedMaxDrawDown
public static double Run(double ccr)
double cumCCR = new double[ccr.Length];
double summedCcr = 0;
for (int idx = 0; idx < ccr.Length; idx++)
summedCcr += ccr[idx];
cumCCR[idx] = summedCcr;
var cumulativeCcr = Vector<double>.Build.DenseOfArray(cumCCR);
var invCumulativeCcr = 1 / cumulativeCcr;
var cumulativeCcrMat = (Vector<double>.OuterProduct(cumulativeCcr, invCumulativeCcr) - 1);
I expected the vectorized version to be way faster than the others, but the simpler for loop version is two orders of magnitude faster!
So the questions are:
- Why is the simple for loop version faster?
- Is my implementation correct?
- How can be improved? (e.g. a better way to estimate the vector cumulative sum )
Any hint will be much appreciated, thanks in advance.