R programming – Linear Regression

1. GPA‘s (Grade Point Averages) for 12 graduating MBA students, GPA. and their GMAT scores taken
before entering the MBA program are given below. Use the GMAT scores as a predictor of GPA, and
conduct a regression of GPA on GMAT scores.

x=GMAT y=GPA

560 3.20
540 3.44
520 3.70
580 3.10
520 3.00
620 4.00
660 3.38
630 3.83
550 2.67
550 2.75
600 2.33
537 3.75

(8.) Obtain and interpret the coefficient of determination R3.

(b) Calculate the fitted value for the second person

(0) Test wether GMAT is an important predictor variable (use significance level 0.05)

2. Suppose we have a data set with five predictors. X1 =GPA. X2 = IQ. X3 = Gender (1 for Female
and 0 for Male). X4 = Interaction between GPA and IQ. and X5 = Interaction between GPA and
Gender. The response is starting salary after graduation (in thousands of dollars). Suppose we use least
squares to fit the model. and get [9‘0 = 50. 31 =20. [32:007. 33 = 35.. 34: 0.01. and [35: -10.

(8.) Which of the following 4 statements is correct, and why?
i. For a fixed value of IQ and GPA, males earn more on average than females.
11. For a fixed value of IQ and GPA, females earn more on average than males.
11.1.For a fixed value of IQ and GPA, males earn more on average than females provided that the GPA
is high enough.
iv. For a fixed value of IQ and GPA, females earn more on average than males provided that the GPA
is high enough.
(b) Predict the salary of a female with IQ of 110 and a GPA of 4.0.

READ ALSO : Project Management

(0) True or false: Since the coefficient for the GPAfIQ interaction term is very small, there is very little
evidence of an interaction effect. Justify your answer.

3. This question involves the use of simple linear regression on the Auto data set (please download it via the
course BlackBoard page).

(8.) Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as
the predictor. Use the summary() function to print the results. Comment on the output. For example:
1. Is there a relationship between the predictor and the response?
ii. How strong is the relationship between the predictor and the response?
iii. Is the relationship between the predictor and the response positive or negative?
iv. What is the predicted mpg associated with a horsepower of 98? What are the associated
95% confidence and prediction intervals?
(b) Plot the response and the predictor. Use the abli11e() fimction to display the least squares
regression line.
(c) Use the plot() function to produce diagnostic plots of the least squares regression fit.
Comment on any problems you see with the fit.

4. In this exercise you will create some simulated data and fit simple linear regression models to it. Make
sure to use command set.seed(1) prior to starting part (a) to ensure consistent results. (Hint:
1110111101. mean = a. sd = b) generates 11 random variables with mean a. standard deviation 1). e.g..
11101111000. mean = 10. sd = 5) returns a vector with 100 values, each of which follows a normal
distribution with mean 10 and standard deviation 5.)

READ ALSO : Uber

(8.) Using the rnorm() function. create a vector. x. containing 100 obsermtions drawn from a N (O, 1)
distribution. This represents a feature. X.

(b) Using the rnorm() function. create a vector. 6′. containing 100 observations drawn from a N(0, 0.25)
distribution i. e. a normal distribution with mean :ero and variance 0 . 2 5.

Y =-1+O.5X+€. (1)

What is the length of the vector 3′? What are the values of ,80 and ,61 in this linear model?

((1) Create a scatterplot displaying the relationship between x and y. Comment on what you observe.

(6) Fit a least squares linear model to predict 3′ using 1:. Comment on the model obtained.
How do 50 and 31 compare to ,60 and fll

(1’) Now fit a polynomial regression model that predicts y using x and x2. Is there evidence that the
quadratic term improves the model fit? Explain your answer.

(g) Repeat (a)-(f) after modifying the data generation process in such a way that there is less noise in the
data. The model (1) should remain the same. You can do this by decreasing the variance of the
normal distribution used to generate the error term 6 in (b). Describe your results.
(h) Repeat (a)-(f) after modifying the data generation process in such a way that there is more noise in
the data. The model (1) should remain the same. You can do this by increasing the variance of the
normal distribution used to generate the error term 6 in (b). Describe your results.

(i) What are the confidence intervals for flo and ,61 based on the original data set, the noisier data set. and the
less noisy data set? Comment on your results.