# Writing from Sources, Writing from Sentences

If you haven’t already done so, using the “QuizIIDataset.csv” data file,  create a variable called lnPPSF which is the log of the sale_price divided by the gross_square feet variable. Also if you haven’t already done so, create a new variable that is distance of each property to the Empire State Building in miles using the following formula: DistESB=sqrt((69.1691 *( latitude-40.748441))^2+ (52.5179*( longitude- -73.985664))^2), where the given lat/long coordinates are for the Empire State Building. Next, run a regression of lnPPSF on year_built, year, land_square_feet, gross_square_feet, and DistESB. You must use the robust standard errors, which corrects your standard errors for heteroskedasticity.

 The coefficient estimate for gross_square_feet is about 6.7 times greater than the standard error (in absolute value). The coefficient estimate for gross_square_feet is small, therefore, we conclude it has no effect on housing prices. We cannot reject the null hypothesis that gross_square_feet does not affect housing prices. The coefficient estimate for gross_square_feet is about 6.3 times greater than the standard error (in absolute value).

On Stata printout, the standard errors (Std. Error) column tells you which of the following:

 1 The estimated standard deviations of the population coefficients. 2 The estimated standard deviation of the residuals. 3 The estimated probability that we can just reject the null hypothesis for each coefficient. 4 The estimated standard deviations for the estimated coefficients.

The following table is a regression of the price of a house (in thousands of dollars) on the number of bedrooms, the size of the lot (in square feet) and the square feet of house itself.

 Dependent Variable: PRICE Method: Least Squares Included observations: 88 Variable Coefficient Std. Error t-Statistic Prob. C -21.77031 29.47504 -0.738601 0.4622 BEDROOMS 13.85252 9.010145 1.537436 0.1279 LOTSIZE 0.002068 0.000642 3.220096 0.0018 SQRFT 0.122778 0.013237 9.275093 0.0000 R-squared 0.672362 Mean dependent var 293.5460 Adjusted R-squared 0.660661 S.D. dependent var 102.7134 S.E. of regression 59.83348 Akaike info criterion 11.06540 Sum squared resid 300723.8 Schwarz criterion 11.17800

Based on the above regression, which of the following is true?

 1 The level of significance for which we can just reject the null that the # of bedrooms has no effect on the price is 0.1279. 2 We can reject the null hypothesis that the number of bedrooms does not affect the price with 99% confidence. 3 The number of bedrooms definitely determines the price of the house. 4 We reject the null hypothesis of a zero coefficient on the bedroom variable with a 10% level of significance.

For this question, you need figure out how to merge the “QuizIIDatasetSandyAddon.csv” data set with the “QuizIIDataset.csv” data file (hint use nycid as the key and merge one to one). After you merge the two data sets (if you haven’t done it already) create a new variable called lnPPSF which is the log sale_price divided by the gross_square_feet. Then run a regression lnPPSF on the year_built, year, land_square_feet gross_square_feet, and surgeheight, where the surgeheight variable tells you how many feet the Sandy storm surge rose at each house in the sample. After you run the regression answer the following question.

Based on the regression, we can conclude which of the following:

 A 10% rise in the storm surge led to about a 2.4% rise in housing prices all else equal. A one foot rise in the storm surge led to about a 2.4% drop in housing prices, all else equal. The storm surge had no statistically significant effect on housing prices. None of the above.

In Stata, import the “QuizIIDataset.csv” data file. Create a new variable called lnPPSF, which is equal to the log of the sale_price divided by the gross_square_feet. Then run a regression of lnPPSF on year_built, year, land_square_feet, and gross_square_feet. The year variable is year the sale took place; the year_built is the year the house was constructed.

Based on the Stata print out, which of the following is true:

 1 On average, prices increased about 13% each year. 2 On average, prices increased about \$13,000 each year. 3 We can not reject the null hypothesis of a no price increase over the period. 4 According to the regression, prices fell during the period.

The following table is a regression of the price of a house (in thousands of dollars) on the number of bedrooms, the size of the lot (in square feet) and the square feet of house itself.

 Dependent Variable: PRICE Method: Least Squares Included observations: 88 Variable Coefficient Std. Error t-Statistic Prob. C -21.77031 29.47504 -0.738601 0.4622 BEDROOMS 13.85252 9.010145 1.537436 0.1279 LOTSIZE 0.002068 0.000642 3.220096 0.0018 SQRFT 0.122778 0.013237 9.275093 0.0000 R-squared 0.672362 Mean dependent var 293.5460 Adjusted R-squared 0.660661 S.D. dependent var 102.7134 S.E. of regression 59.83348 Akaike info criterion 11.06540 Sum squared resid 300723.8 Schwarz criterion 11.17800

How do we interpret the coefficient for BEDROOMS?

 1 For each additional bedroom, the housing price goes up by \$13.85. 2 For each additional bedroom, the housing price goes up by 13.85% 3 For each additional bedroom, housing price goes up by \$13,852. 4 None of the above.

If you haven’t already done so, using the “QuizIIDataset.csv” data file,  create a variable called lnPPSF which is the log of the sale_price divided by the gross_square feet variable. Also if you haven’t already done so, create a new variable that is distance of each property to the Empire State Building in miles using the following formula: DistESB=sqrt((69.1691 *( latitude-40.748441))^2+ (52.5179*( longitude- -73.985664))^2) , where the given lat/long coordinates are for the Empire State Building. After you do that, answer the following question. Run a regression of lnPPSF on year_built, year, land_square_feet, gross_square_feet, and DistESB, and answer the following question.

According the Stata printout:

 1 There is no effect of distance the Empire State Building on housing prices. 2 Each additional mile further away from the Empire State Building shows that housing prices increase, on average, by about 4.4%. 3 Each additional mile away from the Empire State Building shows that housing prices decrease, on average, by about 4.4%. 4 Each additional mile away from the Empire State Building shows that housing prices decrease, on average, by about 0.044%. 5 Each additional mile away from the Empire State Building shows that housing prices decrease, on average, by about \$0.044.

Using the “QuizIIDataset.csv” data file, if you haven’t already done so, create a variable called lnPPSF which is the log of the sale_price divided by the gross_square feet variable. Next create a new variable that is distance of each property to the Empire State Building in miles using the following formula: DistESB=sqrt((69.1691 *( latitude-40.748441))^2+ (52.5179*( longitude- -73.985664))^2), where the given lat/long coordinates are for the Empire State Building. After you do that, answer the following question.

What is the correlation coefficient between the lnPPSF and DistESB?

 1 -.0234751 2 0.0778 3 0.0010 4 -0.0778 5 None of the above.

If you have not already done it, for this question, you need figure out how to merge the “QuizIIDatasetSandyAddon.csv” data set with the “QuizIIDataset.csv” data file (hint: use nycid as the key and merge one to one). After you merge the two data sets, (if you have not done so already) create a new variable called lnPPSF which is the log of sale_price divided by the gross_square_feet. Then run a regression lnPPSF on the year_built, year, land_square_feet, gross_square_feet, and surgeheight, where the surgeheight variable tells you how many feet the Sandy storm surge rose at each house in the sample.

After you run a regression, test to see if there is heteroskedasticity with regard to the residuals and then choose the correct answer below.

 We cannot reject the null of no hesteroskedasticity. We can reject the null hypothesis of no heteroskedasticity with greater than 99% confidence. The evidence suggests that we need not concern ourselves with heteroskedasticity. We can reject the null hypothesis of no heteroskedasticity with just 70% confidence.