Assignment on Simple Linear Regression

Image result for old faithful

It is well known that the famous geyser Old Faithful in Yellowstone National Park erupts quite regularly, and hence it has attracted millions of visitors. The data file “Geyser” gives information about eruptions during October 1980*. Variables are the Duration in seconds of the current eruption, and the Interval, the time in minutes to the next eruption. The park service uses data like these to obtain a prediction equation for the time to the next eruption. Such time predictions are shown to the tourists who patiently wait for the next eruption of Old Faithful.

*The data were collected by volunteers and made public by R. Hutchinson. Apart from missing data for the period from midnight to 6 am, this is a complete record of eruptions for this month.

1. Using Descriptive Statistics in Data Analysis of Excel, find the 95% and 99% confidence intervals for the population means of Duration and Interval? Interpret these intervals. Compare the widths of the two Duration intervals and of the two Interval intervals, and make relevant comments about the differences in these widths.

2. Using Excel, plot a scatter chart that summarizes a linear relation between Duration and Interval; have Duration on the horizontal axis and Interval on the vertical axis. Include in this chart the “trendline” and the coefficient of determination. How would you interpret the slope of this “trendline”?

3. Using Regression in Data Analysis of Excel or Multiple Linear Regression in Predict of XLMiner, find the estimated regression equation for Interval from Duration. Compare it with the “trendline” found in Task 2. In the output identify the coefficient of determination and compare it with that found in Task 2. What is the interpretation of this coefficient? What is the estimated variance of the error term in the assumed model ? What is the estimated standard deviation of this error term? Suppose a tourist has just arrived at the end of an eruption that lasted 3.5 minutes. What is his/her predicted waiting time in hours to the next eruption?

4. Using Excel or XLMiner output found in Task 3, at the 5% significance level, conduct the test for testing the significance of Duration. (Specify clearly your hypotheses and symbols used, indicate the value of the test statistic and the distribution of the test statistic, identify the test p-value, make your conclusion and interpret this conclusion.)

5. Using Excel or XLMiner output found in Task 3, identify the 95% confidence interval for the population slope (the coefficient ) of Duration? Interpret this interval. Conduct the test in Task 4 using this interval. Did you get the same conclusion as in Task 4?

6.

A. Consider a population of tourists who arrive at the end of an eruption that lasted 3.5 minutes. What is their average waiting time to the next eruption? What is the 95% confidence interval for their average waiting time to the next eruption? (Show all details of your calculations!)

B. Suppose a tourist (John Smith) has just arrived at the end of an eruption that lasted 3.5 minutes. What is his predicted waiting time to the next eruption? What is the 95% confidence interval for his predicted waiting time to the next eruption? (Show all details of your calculations!)

C. Explain the difference in the width of the intervals found in Tasks 6A and 6B.

Note: For the estimated regression equation and , the point prediction is obviously and you already found this prediction in Task 3.

A. The confidence interval for the mean is

,

where with , = standard error of the estimate, and is the sample variance of X.

B. The prediction interval for is

Use Microsoft Word to write a report with your name shown on the first page. The report should include all your Excel outputs (copy and paste them), so do not attach any separate Excel files.