| > |
lab8.mws --- Probability Density Functions and Cumulative Distribution Functions
| > | restart; with( plots ): |
| > |
Lab Overview
The topic of this lab is density functions. It starts from the definitions, considers some examples, then ends with a few exercises for you to complete. This topic is appropriate now because some of the computations involve improper integrals.
Maple will be used to prepare plots, to perform some integrations, and to solve some equations. If you follow the outline presented, you should be able to solve almost any question asked about this topic.
The questions for this lab are taken from the assigned homework problems. In addition to the regular lab, there are two bonus questions. These must be submitted separately, but are also due at midnight Thursday.
Deadline for submitting a lab solution is midnight, Thursday, March 6, 2003.
| > |
Definitions
Probability Density Function (pdf)
A
probability density function
is a function p defined on (
,
) with the following two properties
| > |
Mean
The
mean
,
, of a random variable with probability density function p is
.
The mean of a pdf corresponds to the ``center of mass'' for the probability density. [see p. 416 of the text]
| > |
Variance
The
variance
,
, of a random variable with probability density function p is
.
The variance is a measure of dispersion, or the spread, of a probability density. If the density is clustered close to
then
is small; a pdf that is more spread out will have a larger variance. [see p. 417 of the text]
| > |
Cumulative Distribution Function (cdf)
The cumulative distribution function , P, of a probability density function p, is defined to be
.
| > |
Properties of pdf's and cdf's
.
>= 0, so P is an increasing (actually nondecreasing) function.
= 1
= 0
.
| > |
Example 1
After measuring the duration of many telephone calls, the telephone company found their data was well-approximated by the pdf
| > | p1 := 0.4*exp(-0.4*x); |
where
is the duration of the call in minutes.
| > |
(a) What percentage of the calls last between 1 and 2 minutes?
The probability that a call lasts between 1 and 2 minutes is
| > | q1 := Int( p1, x=1..2 ); |
The value of this integral is
| > | value( q1 ); |
Therefore, 22.1% of all calls lasted between 1 and 2 minutes.
| > |
(b) What percentage of calls last 1 minute or less?
The integral representing the percentage of all calls lasting 1 minute or less is
| > | q2 := Int( %?, x = %? .. %? ); # fill in each %? with the appropriate quantity |
The value of this integral is
| > | value( q2 ); |
So the percentage of all calls lasting 1 minute or less is %?.
| > |
(c) What percentage of calls last 3 minutes or more?
The integral representing the percentage of all calls lasting 3 minutes or more is
| > | q3 := Int( %?, x = %? .. %? ); # fill in each %? with the appropriate quantity |
The value of this integral is
| > | value( q3 ); |
So the percentage of all calls lasting 3 minute or more is %?.
| > |
(d) Find the cumulative density function (cdf).
Note that it does not make sense to talk about calls lasting a negative number of minutes. The probability density function is actually
The way this is handled in practice is that the lower limit of -infinity in the defintion is replaced by the smallest value that the random variable can actually attain. In this case, the shortest telephone call is 0 minutes and the cumulative distribution function is
| > | P1 := Int( p1, x=0..t ); |
| > | value( P1 ); |
That is, the cdf is
for all
>= 0.
| > |
Example 2
In 1950 an experiment was done observing the time gaps between successive cars on the Arroyo Seco Freeway. The data show that the pdf of these time gaps was given approximately by
| > | p2 := a*exp(-0.122*x); |
where
is the time in seconds and
is a constant.
| > |
(a) Find
.
All pdf's must be nonnegative and must have ``total area'' equal to 1, i.e.,
. The nonnegativity condition is satisfied for all
>= 0. Because the shortest realistic time gap is
, the total area is
| > | q3 := Int( p2, x=0..infinity ); |
The value of this integral is
| > | q4 := value( q3 ); |
Now, the requirement that the total area be 1 means
| > | q5 := q4 = 1; |
The solution to this equation is
| > | q6 := solve( q5, {a} ); |
So, the value of
must be
. (Could you have predicted this from the beginning?)
Thus, the pdf is really
| > | p2a := eval( p2, q6 ); |
| > |
(b)Find P, the cdf.
The integral that defines the cdf is
| > | P2 := Int( p2a, x=0..t ); |
The value of this integral is
| > | q7 := value( P2 ); |
Thus, the cdf is
for all
>= 0.
| > |
(c) Find the median and mean time gap.
The mean time gap is given by the improper integral
| > | mu2 := Int( x*p2a, x=0..infinity ); |
The value of this integral is
| > | value( mu2 ); |
and the mean time gap is
seconds.
| > |
The median? I expected this to ask for the variance. Well, since they didn't ask for the variance, I will -- see Lab Question 3. Now, back tp the median. The median should be the middle time gap. Since the time gaps range from 0 to infinity, it makes no sense to use the midpoint of this range. The median time gap is the value of
with the property that
Prob( time gap <=
) = Prob( time gap >=
) = 1/2.
Thus, we must solve the equation
| > | q8 := Int( p2a, x=0..T ) = 1/2; |
Forcing Maple to evaluate the integral
| > | q9 := value( q8 ); |
and solving for
| > | solve( q9, {T} ); |
So the median time gap is
seconds.
| > |
(d) Sketch the graphs of the pdf and cdf.
| > | plot( p2a, x=0..100, title="Plot of the PDF p(x)=0.122*exp(-0.122*x)" ); |
| > | plot( q7, t=0..100, title="Plot of the CDF for p(x)=0.122*exp(-0.122*x)" ); |
| > |
Example 3
The pdf for the normal distribution is
.
| > | p3 := (mu,sigma) -> 1/(sigma*sqrt(2*Pi))*exp(-(x-mu)^2/(2*sigma^2)): |
| > |
(a)(i) Sketch graphs of the pdf for the normal distribution with fixed
(say
= 5) and varying
(say,
= 1, 2, 3) .
| > | plot( [ p3(5,1), p3(5,2), p3(5,3) ], x=-10..20, color=[red,blue,green], legend=["mu=5, sigma=1","mu=5,sigma=2","mu=5,sigma=3"] ); |
| > |
(a)(ii) Sketch graphs of the pdf for the normal distribution with varying
(say
= 4, 5, 6) and fixed
(say,
= 1).
See Lab Question #4.
| > |
(b) Explain how the graphs confirm that
is the mean of the distribution and that
is a measure of how closely the data is clustered around the mean.
Notice that the curves plotted in (a) all have a peak at
. but have larger maximum values and are more spread out as the value of
increases. For the normal distribution, this peak occurs at the mean.
You should see that the curves plotted in (c) all have the same shape -- same maximum value and same degree of spreading -- but that the peaks occur at different values of
. These pdfs all have the same variance but different means.
| > |
| > |
Example 4
Consider the normal distribution
.
(a) Show that p has a maximum at
. What is the maximum value?
The maximum should occur at a stationary point of the pdf. The derivative of the normal distribution is
| > | q10 := diff( p3(mu,sigma), x ); |
Setting this equal to zero and solving for
yields:
| > | q11 := solve( q10=0, {x} ); |
It remains to verify that
is the maximum. This can be done by noting that the derivative changes sign from positive to negative at
. It can also be done using the Second Derivative Test. The second derivative of the pdf is
| > | q12 := diff( p3(mu,sigma), x,x ); |
which simplifies to
| > | q13 := simplify( q12 ); |
When this is evaluated at the stationary point we find
| > | eval( q13, q11 ); |
Since
> 0,
< 0 and
is a maximum.
| > |
(b) Show that p has inflection points at
and
.
The pdf has possible inflection points where the
, that is
| > | q14 := q13=0; |
Note that the second derivative is the product of a constant (
), an exponential (
), and a quadratic (
). The constant is positive and the exponential is always positive. The two roots of the quadratic are
| > | solve( -sigma^2+x^2-2*x*mu+mu^2 = 0, {x} ); |
Because these are real-valued and different (because
> 0), the quadratic does change sign at these two points and they are inflection points.
| > |
(c) Describe in your own words what
and
tell you about the distribution.
See Example 3 (b)
| > |
Lab Questions
1. From Example 1, what percentage of calls last 1 minute or less?
2. Based on Example 1, what percentage of calls last between 20 minutes and 40 minutes? 20 minutes or more?
3. Find the mean and variance of the general exponential distribution,
, with parameter
> 0. (Your answers will involve the parameter
.)
Note
: If you need to tell Maple that
is positive, you can insert
assuming a>0
immediately before the semi-colon.
4. Sketch graphs of the pdf for the normal distribution with varying
(say
= 4, 5, 6) and fixed
(say,
= 1).
5 What is the median of the normal distribution with mean
and variance
?
Hints and Notes:
, the error function is defined to be
.
6. [Essay Question] Let
be the speed, in meters/second, of an oxygen molecule, and let
be the pdf of the speed distribution of oxygen molecules at room temperature. Maxwell showed that
,
where
= 1.4 x
is the Boltzman constant,
is the temperature in Kelvin (room temperature is
), and
= 5 x
is the mass of the oxygen molecule in kilograms.
(a) Find the value of
.
(b) Estimate the median and mean speed. Find the maximum of
.
(c) How do your answers in (b) for the mean and maximum of
change as
changes?
| > |
Bonus Questions
1. [4 points]
Suppose that
measures the time (in hours) it takes for a student to complete an exam. Assume that all students are done within two hours and the density function for
is given by
(a) What proportion of students take between 1.5 and 2 hours to finish the exam?
(b) What is the mean time for students to complete the exam?
(c) Compute the median of this distribution.
(d) What is the cumulative distribution function for this pdf?
| > |
2. [3 points]
If we think of an electron as a particle, the function
is the cumulative distribution function of the distance, r, of the electron in a hydrogen atom from the center of the atom. The distance is measured in Bohr radii. (1 Bohr radius = 5.29 x
m.)
For example,
= 0.32 means that the electron is within 1 Bohr radius from the center of the atom 32% of the time.
(a) Find a formula for the density function of this distribution. Sketch the density function and the cumulative distribution function.
(b) Find the median distance and the mean distance. Near what value of
is an electron most likely to be found?
(c) The Bohr radius is sometimes called the ``radius of the hydrogen atom''. Why?
| > |
| > |