Skip to main content
Logo image

Section 10.2 Curve Fitting

Another way to estimate values of a function between data points is curve fitting. Rather than simply looking at nearby data points you take into consideration all of your data points and try to fit one function to all data points at once as well as possible. You then evaluate that function at the desired additional point. You’ll have to decide what type of function makes the most sense to fit your data, for example, a line, a polynomial of a certain degree, a sinusoidal, an exponential, etc. Context and information about the type of data you are collecting should help you decide what type of function would be best.
Take the below data points for example. Suppose again that we wanted to estimate the value of the function at the dotted green line:
We can see that probably a line is the best fit for this data. Observe the difference in the estimated data value at the dotted line if fitting a line versus fitting a high order polynomial.
Linear approximation:
9th order polynomial:
Curve fitting is the process to find a function of a given type (linear, cubic polynomial, sinusoidal, exponential, etc.) to fit the given data as well as possible. Context is incredibly important in deciding what type of curve to fit your data. Often polynomials are used to fit data.
The method of least squares is often used to decide which function (of a given type) is the best fit for the given data. For example, in order to fit a line to the above data we’d need to find the equation \(y = ax+b\) for which the sum of the squares of the differences between the y-values on the approximating line and the given y-data-values is at a minimum:
The line of best fit does not necessarily pass through any or all of the data points but it rather minimizes the error by which the data points are off the line.
Using the notation of the above picture, we need to find the slope \(a\) and the y-intercept \(b\) that minimize the sum of the squares of the residuals for the four given data points:
\begin{equation*} R = [f(x_1)-y_1]^2 + [f(x_2)-y_2]^2 + [f(x_3)-y_3]^2 + [f(x_4)-y_4]^2 \end{equation*}
\begin{equation*} R = [ax_1 + b - y_1]^2 + [ax_2 + b - y_2]^2 + [ax_3 + b - y_3]^2 + [ax_4 + b - y_4]^2 \end{equation*}
Note that the method of least squares can be used just as well to find higher order approximating polynomials.
MATLAB has a built-in function to fit an nth order polynomial to a set of data values:
p = polyfit(x,y,n)
Here, \(p\) is a row vector of coefficients of the resulting polynomial, and \(x\text{,}\) \(y\) are the data vectors. The degree of the desired approximating polynomial is \(n\text{.}\)
The degree of the polynomial \((n)\) must be at least one less than the number of data points \((m)\) since otherwise the resulting polynomial is no longer a curve fit, but rather an interpolating polynomial:
When Using a polynomial of degree \(n\) to fit \(n+1\) data points, the resulting polynomial will manage to actually go through all data points and thus forms an interpolating polynomial, not a curve fit. You can convince yourself that this is true by thinking about two data points: you can find a line (which is a degree 1 polynomial) to go exactly through the two points without any error. Similarly, you can find an \(nth\) degree polynomial that goes exactly through \(n+1\) given data points.