Week 05 Optional Exercises (Solutions)

Feel free to try the exercises below at your leisure. Solutions will be posted later in the week! Note: as usual, the answers below are just one way of solving the prompts!

Data Scraping

Using rvest::html_table, scrape the table of City Council members in Washington D.C. from Wikipedia

wiki_url <- 'https://en.wikipedia.org/wiki/Council_of_the_District_of_Columbia'
council_outputs <- rvest::read_html(wiki_url) %>%
  rvest::html_table() %>%
  .[[3]]
council_outputs %>% head

Using the inspector gadget or similar tool, web scrape two pages of Climate Change news articles titles and links from Politico.

url <- 'https://www.politico.com/news/climate-change'
item <- 'h3'

titles_1 <- rvest::read_html(url) %>% 
  rvest::html_elements(item) %>%
  rvest::html_text2()

hyperlink_1 <- rvest::read_html(url) %>% 
  rvest::html_elements(item) %>%
  rvest::html_elements('a') %>% 
  rvest::html_attr("href") 

#page 2(!)
url <- 'https://www.politico.com/news/climate-change/2'
item <- 'h3'

titles_2 <- rvest::read_html(url) %>% 
  rvest::html_elements(item) %>%
  rvest::html_text2()

hyperlink_2 <- rvest::read_html(url) %>% 
  rvest::html_elements(item) %>%
  rvest::html_elements('a') %>% 
  rvest::html_attr("href") 

data.frame(title = c(titles_1, titles_2), 
           links = c(hyperlink_1, hyperlink_2)) %>%
  head

Working with APIs

Register for an API key with the U.S. Census Bureau. Once it is received, download any data point of interest from the American Community Survey or Decennial Census. (Documentation here)
Try to replicate #1 using the tidycensus package, which is an API wrapper.

Week 05 Optional Exercises (Solutions)

PPOL 670

02/15/2023

Data Scraping

Working with APIs