## 2020-2021

Gainesville

7-9-2020

#### Publisher

the Teaching Statistics: An International Journal for Teachers

#### Book or Journal Information

the Teaching Statistics: An International Journal for Teachers, https://onlinelibrary.wiley.com/doi/abs/10.1111/test.12233

#### Abstract

College‐level statistics courses emphasize the use of the coefficient of determination, R‐squared, in evaluating a linear regression model: higher R‐squared is better. This often gives students an impression that higher R‐squared implies better predictability since textbooks tend to use sample data to support the theory and students rarely have an opportunity to work on real data. In this paper, health care stocks are used as predictors and the result demonstrates that high R‐squared does not necessarily mean high predictability and that multiple linear regression can be used in the study of data behavior. In particular, by learning the pattern of the near and far out‐of‐sample‐prediction errors for different time periods throughout a dataset, the near out‐of‐sample prediction errors can be used to control the prediction errors and identify a subset of predictors that can well reflect the trend of S&P 500.

#### Share

COinS

Multiple Linear Regression: Identify Potential Health Care Stocks for Investments Using Out-of-Sample Predictions

College‐level statistics courses emphasize the use of the coefficient of determination, R‐squared, in evaluating a linear regression model: higher R‐squared is better. This often gives students an impression that higher R‐squared implies better predictability since textbooks tend to use sample data to support the theory and students rarely have an opportunity to work on real data. In this paper, health care stocks are used as predictors and the result demonstrates that high R‐squared does not necessarily mean high predictability and that multiple linear regression can be used in the study of data behavior. In particular, by learning the pattern of the near and far out‐of‐sample‐prediction errors for different time periods throughout a dataset, the near out‐of‐sample prediction errors can be used to control the prediction errors and identify a subset of predictors that can well reflect the trend of S&P 500.