INTERPOLATION/EXTRAPOLATION

 

Theory

Let us presume that we have been given a data set which defines a function, f(x), at N data points (xi = x1, x2, ... xN), but we do not have an explicite analytical expression for f(x). In the topic 'Least-Squares' we discussed how to derive as estimate of the expression using least-squares curve fitting techniques. Another task might be to estimate the value of f(x) at some arbitrary position x. If x is less than x1 or greater than xN then the process of determining f(x) is termed extrapolation. If x is greater than x1 and less than xN then the process is called interpolation. The methods of interpolation or extrapolation which are presented here assume that there is no error in the individual f(xi) measurements. If we wanted to find an analytic function that approximates the data (with noise), then 'smoothing' interpolation or 'least-squares' curve fitting are more appropriate techniques. These techniques are discussed in the topic 'Smoothed-Interpolation'.

The general problem is then how to find an interpolated value f(x) at position x given an initial data set of N values (xi, f(xi))? The first method we will consider will be linear interpolation. The first step in any interpolation is to find xi and xi+1 such that x is between them. The resulting data - xi, xi+1, f(xi), f(xi+1) - define a line segment which contains the unknown f(x)

x(i) x x(i+1)

Either of the following two equations will then linearly interpolate x, based on the known data, to find f(x).

An alternative method of interpolation would be to draw a smooth curve through some or all of the xi in order to define f(x). There are many types of smooth curves that could be fit exactly to the data, but the most used are (1) polynomials, (2) rational functions (quotients of polynomials), and (3) splines (mostly cubic splines).

Polynomials - Through any two points there is a unique line. Through any three points there is a unique binomial. Through any M data points there is a polynomial of order M-1. The form of this polynomial is

where N indicates the order of the polynomial. The best method for estimating y (=f(x)) by polynomial fit is to use a low order polynomial (say 2nd to 4th order; n = 3 to 5), where the data values are those closest to the point of interpolation.

As one goes to higher degree, the polynomial uses additional data further from the point of interpolation and with a higher exponential order. This can lead to serious errors in the interpolated value. Rational funtions are closely related to polynomials (with some better fitting characteristics) and will not be discussed here.

One major problem with low-order polynomial fits is how to properly choose the correct data for a fit to data near x. For a second order-polynomial fit wher x lies between xi and xi+1, the polynomial could be derived from data at xi-1, xi, xi+1 or from data at xi, xi+1, xi+2. The problem is that each three data points will give different polynomial equations and slightly different values for f(x). Which one is correct??? Where one has a choice, one method might be to user only odd-ordered polynomials requiring even numbers of input data. Where the data are noisy, however, this may backfire for higher-order polynomial fits have more possibilities of serious biases in the interpolated data.

One way to minimize problems associated with polynimial fits is to use splines. Splines are low-order continuous functions that have the special feature of having continuous derivitives through each known data point. (Other techniques, based on preference in fitting for the data closest to the point of interpolation, tend to change the 'closest neighbors' as the point of interpolation shifts from one side of a given data point to another. The result is a discontinuity in the total function at the defined datapoint.) The most often used spline technique involves a third-order (cubic) equation no matter how many data points are available for the interpolation.

For each of these interpolation methods, now consider what would happen if you wanted to extrapolate the available data to a position, xN+1, beyond the endpoints of the available data? In this case, the proper choice of a continuous function needed to estimate f(xN+1) is critical, however there is significantly less constraint on its correct identification! Good luck!

 

Algorithm Development

In the process of writing a program to carry out interpolation/extrapolation, there are several key features of program development which need to be considered. The first one is data entry. Under most circumstances you will have a data set that already exists and the task is to OPEN the data file and READ in the values. The best idea is to enter the data as two 1-D arrays (x(i), fx(i) for i=1, n). This will make subsequent searches through the data go more quickly. One important consideration is whether the data are sorted by x(i). It is critical that the data have the following character: x(i) < x(i+1) for all i = 1, n. This may be the default pattern of output in the data file, but the search routines will not work correctly if the data are not in order. In a large data file, you might test for order as you enter the data. For example:

.

REAL x(1000), fx(1000)

C

OPEN (unit=3, name='data.dat', type='old', access='sequential'

1recordsize=80, disp='save')

READ (3, 100) x(1), fx(1)

do 10 i=2,1000

READ (3, 100, end=20) x(i), fx(i)

if(x(i).le.x(i-1) go to 30

10 continue

20 n=i-1

type *, 'The number of input data are', n

if(n.eq.999) type *,'There are more data points than array space'

.

.

30 type *,'The data are not in ascending order'

C

100 Format(1x, f10.0, 2x, f10.0)

C

END

The next task in interpolation/extrapolation problems will be to program the logic of point selection for a polynomial fit to a selected data interval. If you are given a point of interpolation x and an nth-order polynomial to fit, remember that you need to find n+1 data points from the arrays x(i) and fx(i) and that x needs to fit near the middle of the chosen n+1 x(i). When n+1 is an odd number, remember that x can equally well fit between two different sets of x(i) and x(i+1), each of which will produce a different polynomial fit at x. Also you must decide what to do when the point of interpolation is near the ends of the data set. If x lies between x(1) and x(2) or x(n-1) and x(n) and you need to fit more than a first-order polynomial, you may not have enough data to do the job correctly. The normal procedure is to use the nearest points available even if they do not center well onthe point of interpolation.

The third task will be to calculate fx at the chosen point of interpolation, x. You know the order to the polynomial fit and you have chosen the data to calculate the fit. You can either use the methods employed in Topic 1 (for linear interpolation) or you can use a canned subroutine (see below). When using a canned subroutine, the primary problem will be to correctly pass the needed values of x(i) and f(i) to the subroutine. Look carefully at the explanatory information associated with any subroutine before writing a program to make use of it!

The final task which should be considered is goodness of fit for each point of interpolation. There is no single way to do this, but two methods are worth your consideration. The first idea is to always carry out a linear interpolation for each x no matter what order of polynomial or spline you really want to use. By subtracting the polynomial fit interpolation from the linear fit interpolation, you have a quick and easy way to look for large differences which might (not necessarily) indicate a poor interpolation. The other method, which you will use in your problem sets, would be to plot the input data together with the polynomial curves used to interpolate data. It may be very instructive to see how much 'information' is added by the interpolation process over what the original data really 'know'.

 

Canned Subroutines

NUMERICAL RECIPES uses the subroutine POLINT for polynomial interpolation. This program is listed at the end of this exercise. POLINT requires you to input an array of N nearest neighbors (x1,y1), (x2,y2), ...(xn,yn) and it calculates a polynomial of order N-1. It then calculates the value of x (the interpolation point) and an error estimate. Numerical Recipes also has a subroutine SPLINE for cubic spline interpolation. SPLINE is called once for a given data set and then subroutine SPLINT is called each time to calculate the actual interpolation value for each x. These subroutines are listed at the end of this exercise.

 

References

(1) W.H. Press et al., 1986, Numerical Recipes, The Art of Scientific Computing, Cambridge University Press, Chapter 3.