# R-squared with Multiple linear regression - GoodnessOfFit?

(Mike G) #1

I have this sample code for a multiple linear regression. Is this the correct way to approach getting a goodness-of-fit (R-squared) value? Or, for a multiple regression, is this R-squared unreliable, since it is perhaps meant only for simple linear regression?

``````    double[][] xdata = new double[][] { new double[] { 2.2, 4.0 }, new double[] { 1.4, 6.2 }, new double[] { 3.0, 2.0 } };
double[] ydata = new double[] { 15, 20, 10 };

double[] p = Fit.MultiDim(xdata, ydata, intercept: true);
double a = p[0]; //intercept
double b = p[1];
double c = p[2];

var R2 = GoodnessOfFit.RSquared(xdata.Select(x => a + (b*x[0])+(c*x[1])), ydata);
``````

Thanks

(Mike G) #2

Perhaps another way of asking this is how do I get an R2 with any multiple regression? What if I had 5 dimensions in my X-data? Would I instead get each R2 separately, and somehow average them together?

(Christoph Rüegg) #3

Since you did include an intercept this looks correct to me.

The `RSquared` we compute is the square of the coefficient of multiple correlation. There is one `RSquared` per dependent target variable `y`, not per independent predictor variable `x`. So it does not matter whether you have 2 or 5 predictor variables (or one with 2 or 5 dimensions).

(Christoph Rüegg) #4

But this is really an area we could improve, to provide `RSquared` and also other measures out of the box.

(Yiping Ruan) #5

It should be very easy.

//The total sum of squares (proportional to the variance of the data):
double SSTotal = ydata.Sum(y => Math.Pow((y - Mean), 2));
//The sum of squares of residuals, also called the residual sum of squares:
double SSRes = ydataEstimated.Zip(ydata, (a, b) => (a - b)).Sum(r => Math.Pow(r, 2));
double RSquared = 1 - SSRes / SSTotal;

Can I contribute?