>   

lab4.mws --- Fitting Linear, Exponential, Power, and Logarithmic Functions to Data

>   

>    restart;
with( plots ):
with( stats ):
with( student ):

>   

Lab Overview

The purpose of this lab is for you to find the best linear ( y = a*x+b ), exponential ( y = a*exp(b*x) ), power ( y = a*x^b ), and logarithmic ( y = a*ln(x)+b ) function that fits a given set of data. In the process of finding these four functions, you should see that - in most instances - one of these four functions is better than the other three.

 This should provide you with some practice applying the properties of exponentials and logarithms in addition to being helpful in other courses that you are taking (or will take).

After a short review of exponential and power functions, we will look at some sample data and determine the best power or exponential fit for the data. You will then repeat this analysis on several additional sets of data. It is not necessary for you to complete the analysis of all of the data sets. Do only as much work as is needed to answer the questions. Note that some data sets will need to be adjusted prior to performing some of the analysis.

Note that this analysis is related to the uses of (semi-)log and log-log plots, but this topic will not be explored in detail here.

>   

Auxiliary Procedures (execute but do not change)

>    slope := proc( F )
  if type( F, equation ) then
    return lcoeff( rhs(F) )
  else
    return lcoeff( F )
  end if
end proc:

>    intercept := proc( F )
  if type(F,equation) then
    return tcoeff( rhs(F) )
  else
    return tcoeff( F )
  end if
end proc:

>    bestlinearfit := (X,Y)->fit[leastsquare[[x,y]]]([evalf(X),evalf(Y)]):

>    dataplot := proc( X, Y )
  local bestlinfit, opts, p1, p2, xlo, xhi;
  if nargs>2 then opts:= args[3..-1] else opts := NULL end if;
  p1 := plot( zip( (x,y)->[x,y], X, Y ), style=point );
  bestlinfit := rhs(bestlinearfit(X,Y));
  xlo := min(op(X),0);
  xhi := max(op(X),0);
  p2 := plot( bestlinfit, x=xlo..xhi, color=blue );
  return display( [p1,p2], opts )
end proc:

>    datafitplot := proc( X, Y, F )
  local bestlinfit, f, opts, p1, p2, xlo, xhi;
  if nargs>3 then opts:= args[4..-1] else opts := NULL end if;
  if type(F,equation) then f := rhs(F) else f := F end if;
  p1 := plot( zip( (x,y)->[x,y], X, Y ), style=point );
  xlo := min(op(X),0);
  xhi := max(op(X),0);
  p2 := plot( f, x=xlo..xhi, color=blue );
  return display( [p1,p2], opts )
end proc:

>   

Review of Exponential and Power Functions

We want to develop a simple (oftentimes visual) way to determine if a collection of data points is better approximated with an exponential function or a power function. Let's consider a general exponential function:

>    e1 := y=a*exp(b*x):
e1;

If the natural logarithm is applied to both sides of this equation, and the right-hand side is simplified:

>    e2 := map( ln, e1 ):
e2;
simplify( e2 ) assuming real;

The assumption is necessary because of some technical concerns that are not relevant to us. (Be sure you can do this calculation by hand!)

The conclusion that we should draw is that if the plot of the x -values vs. the (natural) logarithm of the y -values is a straight line, then the data corresponds to a power function. Moreover, the slope of the line is the growth (or decay) rate of the exponential function and the y-intercept of the line is the (natural) logarithm of the coefficient of the exponential function.

>   

Next, consider a general power function:

>    p1 := y=a*x^b:
p1;

Taking the logarithm of both sides of this equation, and simplifying the right-hand side leads to:

>    p2 := map( ln, p1 ):
p2;
simplify( p2 ) assuming positive;

Again, be sure you can do these calculations by hand.

The conclusion to be drawn from this result is that if the plot of the logarithm of the x -values vs. the logarithm of the y -values is a straight line, then the data appears to follow a power function. The slope of the line is the power of the power function and the y-intercept of the line is  the logarithm of the coefficient of the power function.

>   

Examples

Fully-Worked Example with Fictitious Data #1

>    X := [ 1.5, 2.0, 3.0, 4.5, 6.0, 8.0 ];
Y := [ 3.01, 3.45, 4.15, 5.00, 5.70, 6.50 ];

The first thing one might do is simply plot the data points. To use Maple's plot command we need to zip  together the x - and y -values to form ( x , y ) pairs.

>    XY := zip( (x,y)->[x,y], X, Y );

>    plot( XY, style=point, view=[0..7,0..6], title="Plot of the data points" );

These points almost look like they could be fit by a straight line. The best linear function for this data is

>    blf1 := bestlinearfit( X, Y ):
blf1;

I have prepared a Maple procedure, dataplot, to plot a set of data and the best linear function. Here is how it could be used for this problem:

>    dataplot( X, Y, title="Plot of x vs. y and best linear function fit" );

Take a minute to check that this linear function corresponds to the one found above. The slope and intercept of the best linear fit can be extracted as follows:

>    m = slope( blf1 );
b = intercept( blf1 );

>   

While this fit looks pretty good, could we do better with a power or exponential function? To find the best exponential fit, we need the logarithms of the y -values:

>    lnY := map( ln, Y );

Now, the plot of this data and the best linear fit to it is easy to obtain.

>    dataplot( X, lnY, title="Plot of x vs. ln(y) and best linear fit to data" );

From this picture, the linear function for the transformed data points does not do a better job of approximating the data than the linear function for the original data points.

>   

To consider the power function the logarithm of the x -values are needed:

>    lnX := map( ln, X );

>    dataplot( lnX, lnY, title="Plot ln(x) vs. ln(y) and best linear fit to data" );

Now that is a good fit! The equation of the linear function to this log-log data is

>    blf2 := bestlinearfit( lnX, lnY ):
blf2;

To obtain the best fitting power function for this data, remember that this equation is really

>    e1 := ln(y) = intercept(blf2) + slope(blf2)*ln(x):
e1;

>    e2 := isolate( e1, y );

>    simplify( e2 );

Note that the coefficient is exp(.9181) = 2.5045  and the power is 0.4591.

>    datafitplot( X, Y, e2, title="Plot of data and best fitting power function" );

This picture confirms that we have a good fit to the data.

>   

>   

Unworked Example with Fictitious Data #2

Source: http://www.ucl.ac.uk/Mathematics/geomath/logsnb/MHlogsnbloglog.html

>    X := [ 0.2, 0.4, 0.6, 0.8, 1.2 ];
Y := [ 1.18, 1.71, 2.31, 2.71, 3.45 ];

>   

General Data Processing

These commands can be used to visualize any set of data points stored in two vectors, X  and Y . Note that all work necessary to ensure the data values are all positive and are appropriately scaled or shifted are assumed to have been completed prior to executing these commands. In some cases it might be necessary to add a view=  option to some of the plot commands.

It is recommended that you copy this section to where you want to use it. The reason for this recommendation is that you will always have a good copy of these commands should you want to revert to the original form.

>    'X' = X;
'Y' = Y;

>   

Best Fit to Data by a Linear Function

>    dataplot( X, Y, title="Data points and best linear fit" );

>    bestlinearfit( X, Y );

>   

Best Fit to Data by an Exponential Function

>    lnY := map( ln, Y );

>    dataplot( X, lnY, title="x vs. ln(y) and best linear fit to transformed data" );

>    blf1 := bestlinearfit( X, lnY ):
blf1;

>    e1 :=  ln(y) = intercept(blf1)+slope(blf1)*x:
e1;

>    e2 := isolate( e1, y ):
simplify( e2 );

>    datafitplot( X, Y, e2, title="Plot of data points and best exponential function fit" );

>   

Best Fit to Data by a Power Function

>    lnX := map( ln , X );

>    dataplot( lnX, lnY, title="ln(x) vs. ln(y) and best linear fit to transformed data" );

>    blf2 := bestlinearfit( lnX, lnY ):
blf2;

>    p1 :=  ln(y) = intercept(blf2)+slope(blf2)*ln(x):
p1;

>    p2 := isolate( p1, y ):
simplify( p2 );

>    datafitplot( X, Y, p2, title="Plot of data points and best power function fit" );

>   

Best Fit to Data by a Logarithmic Function

>    dataplot( lnX, Y, title="ln(x) vs. y and best linear fit to transformed data" );

>    blf3 := bestlinearfit( lnX, Y ):
blf3;

>    L1 :=  y = intercept(blf3)+slope(blf3)*ln(x):
L1;

>    datafitplot( X, Y, L1, title="Plot of data points and best logarithmic function fit" );

>   

>   

>   

>   

>   

The following section contains the general commands needed to perform the graphical and numerical work needed to find each of the four best fitting functions. It is recommended that you put a copy of this section immediately after the data that it will be used to analyze. This was done for you in the Unworked Example above.

General Data Processing

These commands can be used to visualize any set of data points stored in two vectors, X  and Y . Note that all work necessary to ensure the data values are all positive and are appropriately scaled or shifted are assumed to have been completed prior to executing these commands. In some cases it might be necessary to add a view=  option to some of the plot commands.

It is recommended that you copy this section to where you want to use it. The reason for this recommendation is that you will always have a good copy of these commands should you want to revert to the original form.

>    'X' = X;
'Y' = Y;

>   

Best Fit to Data by a Linear Function

>    dataplot( X, Y, title="Data points and best linear fit" );

>    bestlinearfit( X, Y );

>   

Best Fit to Data by an Exponential Function

>    lnY := map( ln, Y );

>    dataplot( X, lnY, title="x vs. ln(y) and best linear fit to transformed data" );

>    blf1 := bestlinearfit( X, lnY ):
blf1;

>    e1 :=  ln(y) = intercept(blf1)+slope(blf1)*x:
e1;

>    e2 := isolate( e1, y ):
simplify( e2 );

>    datafitplot( X, Y, e2, title="Plot of data points and best exponential function fit" );

>   

Best Fit to Data by a Power Function

>    lnX := map( ln , X );

>    dataplot( lnX, lnY, title="ln(x) vs. ln(y) and best linear fit to transformed data" );

>    blf2 := bestlinearfit( lnX, lnY ):
blf2;

>    p1 :=  ln(y) = intercept(blf2)+slope(blf2)*ln(x):
p1;

>    p2 := isolate( p1, y ):
simplify( p2 );

>    datafitplot( X, Y, p2, title="Plot of data points and best power function fit" );

>   

Best Fit to Data by a Logarithmic Function

>    dataplot( lnX, Y, title="ln(x) vs. y and best linear fit to transformed data" );

>    blf3 := bestlinearfit( lnX, Y ):
blf3;

>    L1 :=  y = intercept(blf3)+slope(blf3)*ln(x):
L1;

>    datafitplot( X, Y, L1, title="Plot of data points and best logarithmic function fit" );

>   

>   

>   

Required Data Sets

Regional Drainage Area Data for Dillon, MT

X  - drainage area (mi^2)

Y  - bankfull depth (ft)

Parrett, Omang, Hull, 1983

Mean annual runoff and peak flow estimates based on channel geometry of streams in northeastern and western Montana, US Geological Survey Water-Resources Investigations Report 83-4086, 53 pp.

Source: http://www.homepage.montana.edu/~uessc/esci432/logexpgraph.htm

>    X := [ 4.06, 62.7, 280, 36, 935, 2476 ];
Y := [ 1.4, 2.8, 4, 2.6, 5.5, 7.5];

>   

Radium Emissions from a Watch Hand

X  - count per minute (of radioactive ions)

Y  - number of sheets of paper in gap between source and sensor

Source: http://www.blackcatsystems.com/GM/experiments/ex2.html

>    Y := [ 30335, 26878, 23868, 21680, 19814, 18004, 16344, 15093, 13908, 12920, 11835, 10934, 10220, 9551, 8839, 8299, 7733, 7318 ];
X := [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 ];

>   

Hint:

Note that it is not possible to take the natural logarithm of all of the x values. There are two main ways to overcome this problem:

 i) drop the first x- and y-values

>    Xnew := X[2..-1];
Ynew := Y[2..-1];

ii) change the y-values so that they are all positive - here, it make sense to add 1 to each y-value

>    Xnew := X+[1$nops(X)];

For the remainder of this problem, the second approach will be used because it allows us to keep all data points:

>    X := Xnew;
Y := Y;

>   

>   

U.S. Census Data

X  - year

Y  - average population (in millions) over 10-year period

Source: http://www.mste.uiuc.edu/malcz/ExpFit/data.html

>    X := [ 1815, 1825, 1835, 1845, 1855, 1865, 1875, 1885, 1895, 1905, 1915, 1925, 1935, 1945, 1955, 1965, 1975 ];
Y := [ 8.3, 11.0, 14.7, 19.7, 26.7, 35.2, 44.4, 55.9, 68.9, 83.2, 98.8, 114.2, 127.1, 140.1, 164.0, 190.9, 214.3 ];

>   

Hint:

You might want to shift the x  data to represent the number of years after 1800. For example:

>    X := X - [1800$nops(X)];

>   

>   

>   

Optional Data Sets

Experimental Diode Performance

X  - voltage (V)

Y  - current (milliamp)

Source: http://people.msoe.edu/~rather/me/100lab2.html

>    X := [0.52,0.597,0.619,0.632,0.643,0.65,0.655,0.661,0.672,0.689,0.711,0.722,0.754,0.78,0.801,0.819,0.848,0.87,0.893,0.92];
Y := [0.01,0.1,0.2,0.3,0.4,0.5,0.6,0.7,1,1.6,3.03,4.04,8.37,15,23.4,33,53.7,75.8,102.5,137.5];

>   

Stress-Strain Behavior of Steel

X  - stress (lbf/in^2)

Y  - strain (in)

Source: http://people.msoe.edu/~rather/me/100lab2.html

>    X := [5000,10000,15000,20000,25000,30000,35000];
Y := [0.000172,0.0003490,0.0005092,0.00069,0.000856,0.001018,0.00119];

>   

Voltage Drop Across a Capacitor

X  - time (seconds)

Y  - voltage (V)

Source: http://wwwfp.vuse.vanderbilt.edu:8888/es130/lectures/lecture6/exponential.html

>    X := [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12 ];
Y := [ 10, 6.1, 3.7, 2.2, 1.4, 0.8, 0.5, 0.3, 0.2, 0.1, 0.07, 0.03 ];

>   

Hint:

Why is it not possible to take the logarithm of the x  values? What can you do to overcome this difficulty?

>   

>   

>   

>   

Lab Questions

1. Determine the best fitting power and logarithmic functions for the Regional Drainage Area data.

2. Determine the best fitting power and exponential functions for the Radium Emissions from a Watch Hand data.

3. Find one of the Optional Data Sets that is best fit with a power function; give the best fitting power function.

4. Find one of the Optional Data Sets that is best fit with an exponential function; give the best fitting power function.

5. Which of the four best fitting functions provides the best fit to the U.S. Census data?

Essay Question [2 points]: Explain how the behavior of the exponential, power, and logarithmic functions differ at the origin. How do these differences help select the most appropriate fit for the Regional Drainage Area and Radium Emissions data sets.

>   

>