Feel free to try the exercises below at your leisure. Solutions will be posted later in the week!
Create a regular expression to find words that start with a
vowel. Test your findings on this vector
test <- c('apple', 'banana', 'kiwi', 'eggplant')
Scrape data from the body of the Wikipedia
page here. Using the nrc sentiment library, summarize
the proportion of non-stop words in each category. Compare your findings
with a second Wikipedia
page here.
Using the twitter data from the lab assignment (with the stop
words and other url link language excluded), produce a word cloud of the
word stems used in the tweets (use
SnowballC::wordStem()).
Estimate a topic model for Jane Austen’s Emma (which can
be accessed in the janeaustenr package).
Estimate the model with 5 topics (treating chapters as documents). What
are the top 10 words for each topic?