Feel free to try the exercises below at your leisure. Solutions will be posted later in the week!
Using the data set linked here,
we will attempt to run some basic linear regression models where we will
attempt to predict gdp08 with dem_score14,
pop_urban, and oecd (i.e. regressing
gdp08 on dem_score14, pop_urban,
and oecd).
Load and prepare the data. Create the following new transformed
variables: 1) scale dem_score14 and pop_urban;
2) create a dummy variable for OECD membership.
Randomly split the data set with 50% of the rows in Group A and 50% in Group B. (Try to make this step replicable. For a hint see here.)
Run a basic linear model for each Group created in the previous
step using lm(). Compare the \(R^2\) and RMSE values of each
model.
Re-estimate a linear model using caret::train with
the tag of lm (keep the other arguments at their default
values) and compare to the models estimated in step #3. How do the
outputs in caret::train and lm()
differ?