R-squared with Multiple linear regression - GoodnessOfFit?


(Mike G) #1

I have this sample code for a multiple linear regression. Is this the correct way to approach getting a goodness-of-fit (R-squared) value? Or, for a multiple regression, is this R-squared unreliable, since it is perhaps meant only for simple linear regression?

    double[][] xdata = new double[][] { new double[] { 2.2, 4.0 }, new double[] { 1.4, 6.2 }, new double[] { 3.0, 2.0 } };
    double[] ydata = new double[] { 15, 20, 10 };

    double[] p = Fit.MultiDim(xdata, ydata, intercept: true);
    double a = p[0]; //intercept
    double b = p[1]; 
    double c = p[2]; 

    var R2 = GoodnessOfFit.RSquared(xdata.Select(x => a + (b*x[0])+(c*x[1])), ydata);

Thanks


(Mike G) #2

Perhaps another way of asking this is how do I get an R2 with any multiple regression? What if I had 5 dimensions in my X-data? Would I instead get each R2 separately, and somehow average them together?


(Christoph Rüegg) #3

Since you did include an intercept this looks correct to me.

The RSquared we compute is the square of the coefficient of multiple correlation. There is one RSquared per dependent target variable y, not per independent predictor variable x. So it does not matter whether you have 2 or 5 predictor variables (or one with 2 or 5 dimensions).


(Christoph Rüegg) #4

But this is really an area we could improve, to provide RSquared and also other measures out of the box.


(Yiping Ruan) #5

It should be very easy.

//The total sum of squares (proportional to the variance of the data):
double SSTotal = ydata.Sum(y => Math.Pow((y - Mean), 2));
//The sum of squares of residuals, also called the residual sum of squares:
double SSRes = ydataEstimated.Zip(ydata, (a, b) => (a - b)).Sum(r => Math.Pow(r, 2));
double RSquared = 1 - SSRes / SSTotal;

Can I contribute?