Feel free to try the exercises below at your leisure. Solutions will be posted later in the week!

Prepping Data and Running Basic Models

Using the data set linked here, we will attempt to run some basic linear regression models where we will attempt to predict gdp08 with dem_score14, pop_urban, and oecd (i.e. regressing gdp08 on dem_score14, pop_urban, and oecd).

  1. Load and prepare the data. Create the following new transformed variables: 1) scale dem_score14 and pop_urban; 2) create a dummy variable for OECD membership.

  2. Randomly split the data set with 50% of the rows in Group A and 50% in Group B. (Try to make this step replicable. For a hint see here.)

  3. Run a basic linear model for each Group created in the previous step using lm(). Compare the \(R^2\) and RMSE values of each model.

  4. Re-estimate a linear model using caret::train with the tag of lm (keep the other arguments at their default values) and compare to the models estimated in step #3. How do the outputs in caret::train and lm() differ?