Linear Regression and Hypothesis Testing

This week we have started learning about linear regression (linear models). We understand that the straight line y=ax+b is fitted to the data x,y (i.e. coefficients a,b are chosen) in such a way that the sum of squares of the residuals in minimized (least squares regression).

But no finding in statistics can be discussed without addressing its significance. In the homework you were asked to use the significance blindly, for now, and we did not discuss yet what it means in connection with linear models. This is exactly what I’d like to ask you to do. I do not want you to delve into all the math (we will have to leave great deal of it out of our scope even later in the course), although you can if you wish. Besides, before using math in order to calculate things we need to understand what we want to calculate.

Hence I want you to think about the problem and come to a qualitative understanding of what is at play here, and what is a “significant” finding vs “insignificant” one.

If you run summary() command on a result of linear model fitting returned by lm(), you will see a column Pr(>t). You could guess that it is a result of some t-test. How does it apply here? So far we have seen t-tests in the settings of testing sample locations. The ingredients were: 1) null hypothesis: in earlier cases we looked into the null that stated that two samples came from the distribution(s) with the same means; 2) test statistic: for most examples so far we chose to look at the difference between the means of the two samples; 3) some calculation (either brute force resampling or analytical t-distribution) that told us whether the observed difference between the means is likely to be observed merely by random chance. That latter probability for the test statistic to reach the observed or even larger value under the null hypothesis is the p-value, which directly described the significance of the finding.

READ ALSO :   Case Study 1: The Ideal HPC Programming Language

In connection with the linear model y=ax+b+e (e is the random term), our findings that we want to characterize statistically are coefficients a,b (we just compute them straightforwardly from the data). Describe qualitatively

Place this order with us and get 18% discount now! to earn your discount enter this code: special18 If you need assistance chat with us now by clicking the live chat button.