# Estimation

247

Chapter 8: Estimation In hypothesis tests, the purpose was to make a decision about a parameter, in terms of it being greater than, less than, or not equal to a value. But what if you want to actually know what the parameter is. You need to do estimation. There are two types of estimation – point estimator and confidence interval. Section 8.1 Basics of Confidence Intervals A point estimator is just the statistic that you have calculated previously. As an example, when you wanted to estimate the population mean, µ , the point estimator is the sample mean, x . To estimate the population proportion, p, you use the sample proportion, p̂ . In general, if you want to estimate any population parameter, we will call it θ , you use the sample statistic, θ̂ . Point estimators are really easy to find, but they have some drawbacks. First, if you have a large sample size, then the estimate is better. But with a point estimator, you don’t know what the sample size is. Also, you don’t know how accurate the estimate is. Both of these problems are solved with a confidence interval. Confidence interval: This is where you have an interval surrounding your parameter, and the interval has a chance of being a true statement. In general, a confidence interval looks like: θ̂ ± E , where θ̂ is the point estimator and E is the margin of error term that is added and subtracted from the point estimator. Thus making an interval. Interpreting a confidence interval: The statistical interpretation is that the confidence interval has a probability (1−α , where α is the complement of the confidence level) of containing the population parameter. As an example, if you have a 95% confidence interval of 0.65 < p < 0.73, then you would say, “there is a 95% chance that the interval 0.65 to 0.73 contains the true population proportion.” This means that if you have 100 intervals, 95 of them will contain the true proportion, and 5% will not. The wrong interpretation is that there is a 95% chance that the true value of p will fall between 0.65 and 0.73. The reason that this interpretation is wrong is that the true value is fixed out there somewhere. You are trying to capture it with this interval. So the chance is that your interval captures it, and not that the true value falls in the interval. There is also a real world interpretation that depends on the situation. The common probabilities used for confidence intervals are 90%, 95%, and 99%. These are known as the confidence level. For the 90% level, the α would be 0.10. One last thing to know about confidence is how the sample size and confidence level affect how wide the interval is. The following discussion demonstrates what happens to the width of the interval as you get more confident.

Chapter 8: Estimation

248

Think about shooting an arrow into the target. Suppose you are really good at that and that you have a 90% chance of hitting the bull’s eye. Now the bull’s eye is very small. Since you hit the bull’s eye approximately 90% of the time, then you probably hit inside the next ring out 95% of the time. You have a better chance of doing this, but the circle is bigger. You probably have a 99% chance of hitting the target, but that is a much bigger circle to hit. You can see, as your confidence in hitting the target increases, the circle you hit gets bigger. The same is true for confidence intervals. This is demonstrated in figure #8.1.1. Figure #8.1.1: Affect of Confidence Level on Width

The higher level of confidence makes a wider interval. There’s a trade off between width and confidence level. You can be really confident about your answer but your answer will not be very precise. Or you can have a precise answer (small margin of error) but not be very confident about your answer. Now look at how the sample size affects the size of the interval. Suppose figure #8.1.2 represents confidence intervals calculated on a 95% interval. A larger sample size from a representative sample makes the width of the interval narrower. This makes sense. Large samples are closer to the true population so the point estimate is pretty close to the true value. Figure #8.1.2: Affect of Sample Size on Width

Now you know everything you need to know about confidence intervals except for the actual formula. The formula depends on which parameter you are trying to estimate. With different situations you will be given the confidence interval for that parameter.

Chapter 8: Estimation

249

Section 8.1: Homework 1.) Suppose you compute a confidence interval with a sample size of 25. What will

happen to the confidence interval if the sample size increases to 50?

2.) Suppose you compute a 95% confidence interval. What will happen to the confidence interval if you increase the confidence level to 99%?

3.) Suppose you compute a 95% confidence interval. What will happen to the confidence interval if you decrease the confidence level to 90%?

4.) Suppose you compute a confidence interval with a sample size of 100. What will

happen to the confidence interval if the sample size decreases to 80? 5.) A 95% confidence interval is 6353 km < µ < 6384 km , where µ is the mean

diameter of the Earth. State the statistical interpretation. 6.) A 95% confidence interval is 6353 km < µ < 6384 km , where µ is the mean

diameter of the Earth. State the real world interpretation. 7.) In 2013, Gallup conducted a poll and found a 95% confidence interval of

0.52 < p < 0.60 , where p is the proportion of Americans who believe it is the government’s responsibility for health care. Give the real world interpretation.

8.) In 2013, Gallup conducted a poll and found a 95% confidence interval of

0.52 < p < 0.60 , where p is the proportion of Americans who believe it is the government’s responsibility for health care. Give the statistical interpretation.

Chapter 8: Estimation

250

Section 8.2 One-Sample Interval for the Proportion Suppose you want to estimate the population proportion, p. As an example you may be curious what proportion of students at your school smoke. Or you could wonder what is the proportion of accidents caused by teenage drivers who do not have a drivers’ education class. Confidence Interval for One Population Proportion (1-Prop Interval)

1. State the random variable and the parameter in words. x = number of successes p = proportion of successes

2. State and check the assumptions for confidence interval a. A simple random sample of size n is taken. b. The condition for the binomial distribution are satisfied c. To determine the sampling distribution of p̂ , you need to show that np̂ ≥ 5

and nq̂ ≥ 5 , where q̂ = 1− p̂ . If this requirement is true, then the sampling distribution of p̂ is well approximated by a normal curve. (In reality this is not really true, since the correct assumption deals with p. However, in a confidence interval you do not know p, so you must use p̂ . This means you just need to show that x ≥ 5 and n − x ≥ 5 .)

3. Find the sample statistic and the confidence interval

Sample Proportion:

p̂ = x

n = # of successes

# of trials Confidence Interval:

p̂ − E < p < p̂ + E

Where p = population proportion p̂ = sample proportion

n = number of sample values E = margin of error zC = critical value where C = 1−α q̂ = 1− p̂

E = zC p̂q̂ n

4. Statistical Interpretation: In general this looks like, “there is a C% chance that

p̂ − E < p < p̂ + E contains the true proportion.”

5. Real World Interpretation: This is where you state what interval contains the true proportion.

Chapter 8: Estimation

251

The critical value is a value from the normal distribution. Since a confidence interval is found by adding and subtracting a margin of error amount from the sample proportion, and the interval has a probability of containing the true proportion, then you can think of this as the statement P p̂ − E < p < p̂ + E( ) = C . You can use the invNorm command on the calculator to find the critical value. The critical values will always be the same value, so it is easier to just look at table A.1 in the appendix. Example #8.2.1: Confidence Interval for the Population Proportion Using the Formula

A concern was raised in Australia that the percentage of deaths of Aboriginal prisoners was higher than the percent of deaths of non-Aboriginal prisoners, which is 0.27%. A sample of six years (1990-1995) of data was collected, and it was found that out of 14,495 Aboriginal prisoners, 51 died (“Indigenous deaths in,” 1996). Find a 95% confidence interval for the proportion of Aboriginal prisoners who died. Solution: 1. State the random variable and the parameter in words.

x = number of Aboriginal prisoners who die p = proportion of Aboriginal prisoners who die

2. State and check the assumptions for a confidence interval a. A simple random sample of 14,495 Aboriginal prisoners was taken. However,

the sample was not a random sample, since it was data from six years. It is the numbers for all prisoners in these six years, but the six years were not picked at random. Unless there was something special about the six years that were chosen, the sample is probably a representative sample. This assumption is probably met.

b. There are 14,495 prisoners in this case. The prisoners are all Aboriginals, so you are not mixing Aboriginal with non-Aboriginal prisoners. There are only two outcomes, either the prisoner dies or doesn’t. The chance that one prisoner dies over another may not be constant, but if you consider all prisoners the same, then it may be close to the same probability. Thus the assumptions for the binomial distribution are satisfied

c. In this case, x = 51 and n − x = 14495 − 51= 14444 and both are greater than or equal to 5. The sampling distribution for p̂ is a normal distribution.

3. Find the sample statistic and the confidence interval

Sample Proportion:

p̂ = x

n = 51

14495 ≈ 0.003518

Confidence Interval:

zC = 1.96 , since 95% confidence level

E = zC p̂q̂ n

= 1.96 0.003518 1− 0.003518( ) 14495

≈ 0.000964

Chapter 8: Estimation

252

p̂ − E < p < p̂ + E

0.003518 − 0.000964 < p < 0.003518 + 0.000964 0.002554 < p < 0.004482

4. Statistical Interpretation: There is a 95% chance that

0.002554 < p < 0.004482 contains the proportion of Aboriginal prisoners who died.

5. Real World Interpretation: The proportion of Aboriginal prisoners who died is

between 0.0026 and 0.0045. You can also do the calculations for the confidence interval with technology. The following example shows the process on the TI-83/84. Example #8.2.2: Confidence Interval for the Population Proportion Using the Calculator

A researcher studying the effects of income levels on breastfeeding of infants hypothesizes that countries where the income level is lower have a higher rate of infant breastfeeding than higher income countries. It is known that in Germany, considered a high-income country by the World Bank, 22% of all babies are breastfeed. In Tajikistan, considered a low-income country by the World Bank, researchers found that in a random sample of 500 new mothers that 125 were breastfeeding their infants. Find a 90% confidence interval of the proportion of mothers in low-income countries who breastfeed their infants? Solution:

1. State you random variable and the parameter in words. x = number of woman who breastfeed in a low-income country p = proportion of woman who breastfeed in a low-income country

2. State and check the assumptions for a confidence interval a. A simple random sample of 500 breastfeeding habits of woman in a low-

income country was taken as was stated in the problem. b. There were 500 women in the study. The women are considered identical,

though they probably have some differences. There are only two outcomes, either the woman breastfeeds or she doesn’t. The probability of a woman breastfeeding is probably not the same for each woman, but it is probably not very different for each woman. The assumptions for the binomial distribution are satisfied

c. x = 125 and n − x = 500 −125 = 375 and both are greater than or equal to 5, so the sampling distribution of p̂ is well approximated by a normal curve.

Chapter 8: Estimation

253

3. Find the sample statistic and the confidence interval

Go into the STAT menu. Move over to TESTS and choose 1-PropZInt. Figure #8.2.1: Setup for 1-Proportion Interval

Once you press Calculate, you will see the results as in figure #8.2.2. Figure #8.2.2: Results for 1-Proportion Interval

0.218 < p < 0.282

4. Statistical Interpretation: There is a 90% chance that 0.218 < p < 0.282 contains the proportion of women in low-income countries who breastfeed their infants.

5. Real World Interpretation: The proportion of women in low-income countries

who breastfeed their infants is between 0.218 and 0.282.

Chapter 8: Estimation

254

Section 8.2: Homework In each problem show all steps of the confidence interval. If some of the assumptions are not met, note that the results of the interval may not be correct and then continue the process of the confidence interval. 1.) Eyeglassomatic manufactures eyeglasses for different retailers. They test to see

how many defective lenses they make. Looking at the type of defects, they found in a three-month time period that out of 34,641 defective lenses, 5865 were due to scratches. Find a 99% confidence interval for the proportion of defects that are from scratches.

2.) In November of 1997, Australians were asked if they thought unemployment would increase. At that time 284 out of 631 said that they thought unemployment would increase (“Morgan gallup poll,” 2013). Estimate the proportion of Australians in November 1997 who believed unemployment would increase using a 95% confidence interval?

3.) According to the February 2008 Federal Trade Commission report on consumer

fraud and identity theft, Arkansas had 1,601 complaints of identity theft out of 3,482 consumer complaints (“Consumer fraud and,” 2008). Calculate a 90% confidence interval for the proportion of identity theft in Arkansas.

4.) According to the February 2008 Federal Trade Commission report on consumer

fraud and identity theft, Alaska had 321 complaints of identity theft out of 1,432 consumer complaints (“Consumer fraud and,” 2008). Calculate a 90% confidence interval for the proportion of identity theft in Alaska.

5.) In 2013, the Gallup poll asked 1,039 American adults if they believe there was a

conspiracy in the assassination of President Kennedy, and found that 634 believe there was a conspiracy (“Gallup news service,” 2013). Estimate the proportion of American’s who believe in this conspiracy using a 98% confidence interval.

6.) In 2008, there were 507 children in Arizona out of 32,601 who were diagnosed

with Autism Spectrum Disorder (ASD) (“Autism and developmental,” 2008). Find the proportion of ASD in Arizona with a confidence level of 99%.

Chapter 8: Estimation

255

Section 8.3 One-Sample Interval for the Mean Suppose you want to estimate the mean height of Americans, or you want to estimate the mean salary of college graduates. A confidence interval for the mean would be the way to estimate these means. Confidence Interval for One Population Mean (t-Interval)

1. State the random variable and the parameter in words. x = random variable µ = mean of random variable

2. State and check the assumptions for a hypothesis test a. A random sample of size n is taken. b. The population of the random variable is normally distributed, though the t-

test is fairly robust to the assumption if the sample size is large. This means that if this assumption isn’t met, but your sample size is quite large (over 30), then the results of the t-test are valid.

3. Find the sample statistic and confidence interval

x − E < µ < x + E where

E = tc s n

x is the point estimator for µ tc is the critical value where C = 1−α and degrees of freedom: df = n −1 s is the sample standard deviation n is the sample size

4. Statistical Interpretation: In general this looks like, “there is a C% chance that the

statement x − E < µ < x + E contains the true mean.”

5. Real World Interpretation: This is where you state what interval contains the true mean.

The critical value is a value from the Student’s t-distribution. Since a confidence interval is found by adding and subtracting a margin of error amount from the sample mean, and the interval has a probability of containing the true mean, then you can think of this as the statement P x − E < µ < x + E( ) = C . The critical values are found in table A.2 in the appendix How to check the assumptions of confidence interval: In order for the confidence interval to be valid, the assumptions of the test must be true. Whenever you run a confidence interval, you must make sure the assumptions are true. You need to check them. Here is how you do this:

1 For the assumption that the sample is a random sample, describe how you took the sample. Make sure your sampling technique is random.

Chapter 8: Estimation

256

2 For the assumption that population is normal, remember the process of assessing normality from chapter 6.

Example #8.3.1: Confidence Interval for the Population Mean Using the Formula

A random sample of 20 IQ scores of famous people was taken information from the website of IQ of Famous People (“IQ of famous,” 2013) and then using a random number generator to pick 20 of them. The data are in table #8.3.1 (this is the same data set that was used in example #6.4.2). Find a 98% confidence interval for the IQ of a famous person. Table #8.3.1: IQ Scores of Famous People

158 180 150 137 109 225 122 138 145 180 118 118 126 140 165 150 170 105 154 118

Solution: 1. State the random variable and the parameter in words.

x = IQ score of a famous person µ = mean IQ score of a famous person

2. State and check the assumptions for a confidence interval a. A random sample of 20 IQ scores was taken, since that was said in the

problem. b. The population of IQ score is normally distributed as was shown in example

#6.4.2.

3. Find the sample statistic and confidence interval Sample Statistic:

x = 145.4 s ≈ 29.27

Now you need the degrees of freedom, df = n −1= 20 −1= 19

and the C,

which is 98%. Now go to table A.2, go down the first column to 19 degrees of freedom. Then go over to the column headed with 98%. Thus tc = 2.539 . (See table 8.3.2.) Table #8.3.2: Excerpt From Table A.2

Degrees of Freedom (df)

80%

90%

95%

98%

99%

1 3.078 6.314 12.706 31.821 63.657 2 1.886 2.920 4.303 6.965 9.925 3 1.638 2.353 3.182 4.541 5.841 . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 19 1.328 1.729 2.093 2.539 2.861

Chapter 8: Estimation

257

E = tc s n = 2.539 29.27

20 ≈16.6

x − E < µ < x + E 145.4 −16.6 < µ <145.4 +16.6 128.8 < µ <162

4. Statistical Interpretation: There is a 98% chance that 128.8 < µ <162 contains the

mean IQ score of a famous person.

5. Real World Interpretation: The mean IQ score of a famous person is between 128.8 and 162.

Example #8.3.2: Confidence Interval for the Population Mean Using the Calculator

The data in table #8.3.3 are the life expectancies for men in European countries in 2011 (“WHO life expectancy,” 2013). Find the 99% confident interval for the mean life expectancy of men in Europe Table #8.3.3: Life Expectancies for Men in European Countries in 2011

73 79 67 78 69 66 78 74 71 74 79 75 77 71 78 78 68 78 78 71 81 79 80 80 62 65 69 68 79 79 79 73 79 79 72 77 67 70 63 82 72 72 77 79 80 80 67 73 73 60 65 79 66

Solution:

1. State the random variable and the parameter in words. x = life expectancy for a European man in 2011 µ = mean life expectancy for European men in 2011

2. State and check the assumptions for a confidence interval a. A random sample of 53 life expectancies of European men in 2011 was taken.

The data is actually all of the life expectancies for every country that is considered part of Europe by the World Health Organization. However, the information is still sample information since it is only for one year that the data was collected. It may not be a random sample, but that is probably not an issue in this case.

b. The distribution of life expectancies of European men in 2011 is normally distributed. To see if this assumption has been met, look at the histogram, number of outliers, and the normal probability plot. (If you wish, you can look at the normal probability plot first. If it doesn’t look linear, then you may want to look at the histogram and number of outliers at this point.)

Chapter 8: Estimation

258

Figure #8.3.1: Histogram for Life Expectancies of European Men in 2011

Not normally distributed

Number of outliers: IQR = 79 − 69 = 10 1.5* IQR = 15 Q1−1.5* IQR = 69 −15 = 54 Q3+1.5* IQR = 79 +15 = 94

Outliers are numbers below 54 and above 94. There are no outliers for this data set. Figure #8.3.2: Normal Probability Plot for Life Expectancies of European Men in 2011

Not linear

This population does not appear to be normally distributed. The t-test is robust for sample sizes larger than 30 so you can go ahead and calculate the interval.

3. Find the sample statistic and confidence interval Go into the STAT menu, and type the data into L1. Then go into STAT and over to TESTS. Choose TInterval.

Chapter 8: Estimation

259

Figure #8.3.3: Setup for TInterval

Figure #8.3.4: Results for TInterval

71.6 years < µ < 75.8 years

4. Statistical Interpretation: There is a 99% chance that 71.6 years < µ < 75.8 years contains the mean life expectancy of European men.

5. Real World Interpretation: The mean life expectancy of European men is between

71.6 and 75.8 years. Section 8.3: Homework In each problem show all steps of the confidence interval. If some of the assumptions are not met, note that the results of the interval may not be correct and then continue the process of the confidence interval. 1.) The Kyoto Protocol was signed in 1997, and required countries to start reducing

their carbon emissions. The protocol became enforceable in February 2005. Table 8.3.4 contains a random sample of CO2 emissions in 2010 (“CO2 emissions,” 2013). Compute a 99% confidence interval to estimate the mean CO2 emission in 2010. Table #8.3.4: CO2 Emissions (metric tons per capita) in 2010

1.36 1.42 5.93 5.36 0.06 9.11 7.32 7.93 6.72 0.78 1.80 0.20 2.27 0.28 5.86 3.46 1.46 0.14 2.62 0.79 7.48 0.86 7.84 2.87 2.45

Chapter 8: Estimation

260

2.) Many people feel that cereal is healthier alternative for children over glazed donuts. Table #8.3.5 contains the amount of sugar in a sample of cereal that is geared towards children (“Healthy breakfast story,” 2013). Estimate the mean amount of sugar in children cereal using a 95% confidence level. Table #8.3.5: Sugar Amounts (g) in Children’s Cereal

10 14 12 9 13 13 13 11 12 15 9 10 11 3 6 12 15 12 12

3.) In Florida, bass fish were collected in 53 different lakes to measure the amount of

mercury in the fish. The data for the average amount of mercury in each lake is in table #8.3.6 (“Multi-disciplinary niser activity,” 2013). Compute a 90% confidence interval for the mean amount of mercury in fish in Florida lakes. Table #8.3.6: Average Mercury Levels (mg/kg) in Fish

1.23 1.33 0.04 0.44 1.20 0.27 0.48 0.19 0.83 0.81 0.71 0.5 0.49 1.16 0.05 0.15 0.19 0.77 1.08 0.98 0.63 0.56 0.41 0.73 0.34 0.59 0.34 0.84 0.50 0.34 0.28 0.34 0.87 0.56 0.17 0.18 0.19 0.04 0.49 1.10 0.16 0.10 0.48 0.21 0.86 0.52 0.65 0.27 0.94 0.40 0.43 0.25 0.27

4.) In 1882, Albert Michelson collected measurements on the speed of light (“Student

t-distribution,” 2013). His measurements are given in table #8.3.7. Find the speed of light value that Michelson estimated from his data using a 95% confidence interval. Table #8.3.7: Speed of Light Measurements (km/sec)

299883 299816 299778 299796 299682 299711 299611 299599 300051 299781 299578 299796 299774 299820 299772 299696 299573 299748 299748 299797 299851 299809 299723

5.) Table #8.3.8 contains pulse rates after running for 1 minute, collected from

females who drink alcohol (“Pulse rates before,” 2013). Find a 95% confidence interval for the mean pulse rate after exercise of women who do drink alcohol. Table #8.3.8: Pulse Rates (beats per minute) of Woman Who Use Alcohol

176 150 150 115 129 160 120 125 89 132 120 120 68 87 88 72 77 84 92 80 60 67 59 64 88 74 68

Chapter 8: Estimation

261

6.) The economic dynamism, which is the index of productive growth in dollars for countries that are designated by the World Bank as middle-income are in table #8.3.9 (“SOCR data 2008,” 2013). Compute a 95% confidence interval for the mean economic dynamism of middle-income countries. Table #8.3.9: Economic Dynamism (\$) of Middle Income Countries

25.8057 37.4511 51.915 43.6952 47.8506 43.7178 58.0767 41.1648 38.0793 37.7251 39.6553 42.0265 48.6159 43.8555 49.1361 61.9281 41.9543 44.9346 46.0521 48.3652 43.6252 50.9866 59.1724 39.6282 33.6074 21.6643

7.) Table #8.3.10 contains the percentage of woman receiving prenatal care in 2009

for a sample of countries (“Pregnant woman receiving,” 2013). Estimate the average percentage of woman receiving prenatal care in 2009 using a 90% confidence interval. Table #8.3.10: Percentage of Woman Receiving Prenatal Care

70.08 72.73 74.52 75.79 76.28 76.28 76.65 80.34 80.60 81.90 86.30 87.70 87.76 88.40 90.70 91.50 91.80 92.10 92.20 92.41 92.47 93.00 93.20 93.40 93.63 93.68 93.80 94.30 94.51 95.00 95.80 95.80 96.23 96.24 97.30 97.90 97.95 98.20 99.00 99.00 99.10 99.10

100.00 100.00 100.00 100.00 100.00

8.) Maintaining your balance may get harder as you grow older. A study was conducted to see how steady the elderly is on their feet. They had the subjects stand on a force platform and have them react to a noise. The force platform then measured how much they swayed forward and backward, and the data is in table #8.3.11 (“Maintaining balance while,” 2013). Find a 99% confidence interval for the mean sway of elderly people. Table #8.3.11: Forward/backward Sway (mm) of Elderly Subjects

19 30 20 19 29 25 21 24 50

Chapter 8: Estimation

262