For this assignment, I will be answering the following questions:
Question 1:
From our textbook, Introductory Statistics with R pp. 159 Exercises, 9.1 and 9.2
9.1 I revised this question , so please follow my description only. Conduct ANOVA (analysis of variance) and Regression coefficients to the data from cystfibr (> data("cystfibr")) database. You can choose any variable you like. in your report, you need to state the result of Coefficients (intercept) to any variables you like both under ANOVA and multivariate analysis. I am specifically looking at your interpretation of R results.
Extra clue:
The model code:
In R:Interpretation of the results:
Reviewing the output of the model, we can see that in terms of significance, the intercept is highly significant while the other variables: weight, bmp, and fev1 are all significant. Reading further into the results, we can take away the following ideas:
With age, we can see the pemax (maximum expiratory pressure) will decrease by approximately -3.4181 units for every one unit increase in age. However, the R results do not deem this variable as significant so we must move on.
As for weight, pemax (maximum expiratory pressure) will increase by 2.6882 units for every one unit increase in weight. R views this variable as statistically significant so this variable may be quite helpful in future models.
For bmp, which is body mass, the pemax variable will decrease by -2.0657 units for every one unit increase in bmp. This variable is also statistically significant and will definitely be helpful in future analyses.
Lastly, with fev1, which refers to forced expiratory volume, pemax will increase by 1.0882 units for every one unit increase in fev1.
Taking into account the p-values for each of the variables it seems that only fev1 gets rather close to 0.05 significance level at around 0.04695. The rest of the variables and their corresponding p-values are lower than fev1 with the exception of age, but that value is not at all significant to the analysis.
Moving on to the ANOVA table and its output, there are some surprising results. Looking first to the sum of squares which refers to concept of that a higher sum of squares equates to greater variability in the data. The variable age proves to have the highest value at 10098.5. The rest of the variables do not have as high of a sum of squares value. As we review each of the p-values and their corresponding significance levels we immediately see that age is highly significant at the three-star (***) range. Additionally, bmp and fev1 are considered as only marginally significant while there are no stars at all given to weight. Hence, age is a highly significant predictor and not far behind it are bmp and fev1 which alludes to the idea that they contribute to the variability in pemax variable.
Question 2:
9.2 The secher data (> data("secher")) are best analyzed after log-transforming birth weight as well as the abdominal and biparietal diameters. Fit a prediction weight as well as abdominal and biparietal diameters. For a prediction equation for birth weight.
How much is gained by using both diameters in a prediction equation?
The sum of the two regression coefficients is almost identical and equal to 3.
Can this be given a nice interpretation to our analysis?
Please provide a step by step on your analysis and code you use to find out the result.
Extra clue:
In R:
Interpretation of the results:
Analyzing the output of this model, we can see that when using both parameters bwt (birth weight) and ad (abdominal) in the prediction equation, it can be said that the log of ad indeed contributes to the prediction of log(bwt). As we look at the output, most notably the coefficients section, for each one unit increase of log(ad), the estimated change in log(bwt) is 2.2365. Now, given that the sum of the two regression coefficients is almost identical and equal to 3, this suggests that the model is inferring that the total effect on log(bwt) is additive when log(ad) as well as the intercept are considered. Just for reference, additive simply means that the effect of one predictor variable on the response variable is independent of the values of the other predictor variables. Moving on to the R-squared value, it is 0.7959 which means that 79.59% of the variability in log(bwt) can be explained by the model. As for the F-statistic, seeing its high value and the low p-value, one can infer that the model is indeed statistically significant.
~ Katie
No comments:
Post a Comment