Function approximation with regression analysis

This online calculator uses several regression models for approximation of an unknown function given by a set of data points.

The function approximation problem is how to select a function among a well-defined class that closely matches ("approximates") a target unknown function.

This calculator uses provided target function table data in the form of points {x, f(x)} to build several regression models, namely: linear regression, quadratic regression, cubic regression, power regression, logarithmic regression, hyperbolic regression, ab-exponential regression and exponential regression. Results can be compared using the correlation coefficient, coefficient of determination, average relative error (standard error of the regression) and visually, on chart. Theory and formulas are given below the calculator, as per usual.

PLANETCALC, Function approximation with regression analysis

Function approximation with regression analysis

Digits after the decimal point: 4
Linear regression
 
Linear correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Quadratic regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Cubic regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Power regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
ab-Exponential regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Logarithmic regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Hyperbolic regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Exponential regression
 
Correlation coefficient
 
Coefficient of determination
 
Average relative error, %
 
Results
The file is very large. Browser slowdown may occur during loading and creation.

Linear regression

Equation:
\widehat{y}=ax+b

a coefficient
a&=\frac{\sum x_i \sum y_i- n\sum x_iy_i}{\left(\sum x_i\right)^2-n\sum x_i^2}

b coefficient
b&=\frac{\sum x_i \sum x_iy_i-\sum x_i^2\sum y_i}{\left(\sum x_i\right)^2-n\sum x_i^2}

Linear correlation coefficient
r_{xy}&=\frac{n\sum x_iy_i-\sum x_i\sum y_i}{\sqrt{\left(n\sum x_i^2-\left(\sum x_i\right)^2\right)\!\!\left(n\sum y_i^2-\left(\sum y_i\right)^2 \right)}}

Coefficient of determination
R^2=r_{xy}^2

Standard error of the regression
\overline{A}=\dfrac{1}{n}\sum\left|\dfrac{y_i-\widehat{y}_i}{y_i}\right|\cdot100\%

Quadratic regression

Equation:
\widehat{y}=ax^2+bx+c

System of equations to find a, b and c
\begin{cases}a\sum x_i^2+b\sum x_i+nc=\sum y_i\,,\\[2pt] a\sum x_i^3+b\sum x_i^2+c\sum x_i=\sum x_iy_i\,,\\[2pt] a\sum x_i^4+b\sum x_i^3+c\sum x_i^2=\sum x_i^2y_i\,;\end{cases}

Correlation coefficient
R= \sqrt{1-\frac{\sum(y_i-\widehat{y}_i)^2}{\sum(y_i-\overline{y})^2}},
where
\overline{y}= \dfrac{1}{n}\sum y_i

Coefficient of determination
R^2

Standard error of the regression
\overline{A}=\dfrac{1}{n}\sum\left|\dfrac{y_i-\widehat{y}_i}{y_i}\right|\cdot100\%

Cubic regression

Equation:
\widehat{y}=ax^3+bx^2+cx+d

System of equations to find a, b, c and d
\begin{cases}a\sum x_i^3+b\sum x_i^2+c\sum x_i+nd=\sum y_i\,,\\[2pt] a\sum x_i^4+b\sum x_i^3+c\sum x_i^2+d\sum x_i=\sum x_iy_i\,,\\[2pt] a\sum x_i^5+b\sum x_i^4+c\sum x_i^3+d\sum x_i^2=\sum x_i^2y_i\,,\\[2pt] a\sum x_i^6+b\sum x_i^5+c\sum x_i^4+d\sum x_i^3=\sum x_i^3y_i\,;\end{cases}

Correlation coefficient, coefficient of determination, standard error of the regression – the same formulas as in the case of quadratic regression.

Power regression

Equation:
\widehat{y}=a\cdot x^b

b coefficient
b=\dfrac{n\sum(\ln x_i\cdot\ln y_i)-\sum\ln x_i\cdot\sum\ln y_i }{n\sum\ln^2x_i-\left(\sum\ln x_i\right)^2 }

a coefficient
a=\exp\!\left(\dfrac{1}{n}\sum\ln y_i-\dfrac{b}{n}\sum\ln x_i\right)

Correlation coefficient, coefficient of determination, standard error of the regression – the same formulas as above.

ab-Exponential regression

Equation:
\widehat{y}=a\cdot b^x

b coefficient
b=\exp\dfrac{n\sum x_i\ln y_i-\sum x_i\cdot\sum\ln y_i }{n\sum x_i^2-\left(\sum x_i\right)^2 }

a coefficient
a=\exp\!\left(\dfrac{1}{n}\sum\ln y_i-\dfrac{\ln b}{n}\sum x_i\right)

Correlation coefficient, coefficient of determination, standard error of the regression – the same.

Hyperbolic regression

Equation:
\widehat{y}=a + \frac{b}{x}

b coefficient
b=\dfrac{n\sum\dfrac{y_i}{x_i}-\sum\dfrac{1}{x_i}\sum y_i }{n\sum\dfrac{1}{x_i^2}-\left(\sum\dfrac{1}{x_i}\right)^2 }

a coefficient
a=\dfrac{1}{n}\sum y_i-\dfrac{b}{n}\sum\dfrac{1}{x_i}

Correlation coefficient, coefficient of determination, standard error of the regression - the same as above.

Logarithmic regression

Equation:
\widehat{y}=a + b\ln x

b coefficient
b=\dfrac{n\sum(y_i\ln x_i)-\sum\ln x_i\cdot \sum y_i }{n\sum\ln^2x_i-\left(\sum\ln x_i\right)^2 }

a coefficient
a=\dfrac{1}{n}\sum y_i-\dfrac{b}{n}\sum\ln x_i

Correlation coefficient, coefficient of determination, standard error of the regression – the same as above.

Exponential regression

Equation:
\widehat{y}=e^{a+bx}

b coefficient
b=\dfrac{n\sum x_i\ln y_i-\sum x_i\cdot\sum\ln y_i }{n\sum x_i^2-\left(\sum x_i\right)^2 }

a coefficient
a=\dfrac{1}{n}\sum\ln y_i-\dfrac{b}{n}\sum x_i

Correlation coefficient, coefficient of determination, standard error of the regression – the same as above.

Derivation of formulas

Let's start from the problem:
We have an unknown function y=f(x), given in the form of table data (for example, such as those obtained from experiments).
We need to find a function with a known type (linear, quadratic, etc.) y=F(x), those values should be as close as possible to the table values at the same points. In practice, the type of function is determined by visually comparing the table points to graphs of known functions.

As a result we should get a formula y=F(x), named the empirical formula (regression equation, function approximation), which allows us to calculate y for x's not present in the table. Thus, the empirical formula "smoothes" y values.

We use the Least Squares Method to obtain parameters of F for the best fit. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model.

Thus, when we need to find function F, such as the sum of squared residuals, S will be minimal
S=\sum\limits_i(y_i-F(x_i))^2\rightarrow min

Let's describe the solution for this problem using linear regression F=ax+b as an example.
We need to find the best fit for a and b coefficients, thus S is a function of a and b. To find the minimum we will find extremum points, where partial derivatives are equal to zero.

Using the formula for the derivative of a complex function we will get the following equations:
\begin{cases} \sum [y_i - F(x_i, a, b)]\cdot F^\prime_a(x_i, a, b)=0 \\ \sum [y_i - F(x_i, a, b)]\cdot F^\prime_b(x_i, a, b)=0 \end{cases}

For function F(x,a,b)=ax+b partial derivatives are
F^\prime_a=x,
F^\prime_b=1

Expanding the first formulas with partial derivatives we will get the following equations:
\begin{cases} \sum (y_i - ax_i-b)\cdot x_i=0 \\ \sum (y_i - ax_i-b)=0 \end{cases}

After removing the brackets we will get the following:
\begin{cases} \sum y_ix_i - a \sum x_i^2-b\sum x_i=0 \\ \sum y_i - a\sum x_i - nb=0 \end{cases}

From these equations we can get formulas for a and b, which will be the same as the formulas listed above.

Using the same technique, we can get formulas for all remaining regressions.

URL copied to clipboard
PLANETCALC, Function approximation with regression analysis

Comments