Today, we will work with data collected during Singapore’s 2020 general election (GE2020). The objective of this activity is to draw on the R coding skills you acquired over the last few weeks, and to further practice using linear models to predict data. Therefore,
This activity includes more questions than you are expected to be able to answer during class. The rest of the questions can be used for extra practice at home. A .html document with solutions to all questions will be available at the end of the day so you will be able to check your answers.
Now, let’s start. Download fb_followers.csv
from Canvas. After setting the correct working directory, import this dataset in R using the command read.csv()
.
<- read.csv("fb_followers.csv") fbf
Inspect the file and you will find that it contains five variables:
candidate
: The names of the 93 PAP candidates nominated by the People’s Action Party in GE2020fb_followers
: A numeric variable that records the number of Facebook followers of their official Facebook page at midnight the night before polling day (00:00 July 10, 2020)articles
: A numeric variable that records the number of articles published by local newspapers between nomination day and polling day that mentioned the candidate by nameincumbent
: A logical variable that indicates whether the candidate was an incumbent MP (i.e. they had served as an MP in the previous Parliament) or a first-time candidateofficeholder
: A logical verctor that indicates whether the candidate was an officeholder (e.g. Cabinet Minister, Minister of State) in the previous Parliamentfb_followers
data to examine the relationship between the news profiles of PAP candidates during the campaign and the number of Facebook followers they accumulateIs there an relationship between the news profiles of candidates during electoral campaigns (i.e. how often they appear in news media) and the number of followers they collect on social media platforms such as Facebook? If all press is good press and success in politics is at least partly contingent on name recognition (you cannot vote for who you don’t know), then the relationship between the news profile of candidates and their online followings seems worthy of investigation.
This exercise involves examining the relationship between local newspaper coverage of candidates running for the People’s Action Party (PAP) during Singapore’s recent GE2020 and the number of Facebook followers that the candidate had accumulated on their official Facebook page as of midnight the night before polling day on July 10, 2020. In particular, let’s see if we can use the number of newspaper articles that a candidate’s name appeared in (articles
) to predict their Facebook followings (fb_followers
).
Before we attempt to run any linear regressions, let’s first visualise the data and check to see whether our variables meet the conditions for regression. To recall, these are:
fb_followers
by articles
Do you see any outliers? Does the relationship seem straight enough?
plot(fb_followers ~ articles, data=fbf,
main = "Scatter plot of FB followers and news articles",
xlab = "News articles", ylab = "FB followers",
col="lightblue", pch=19)
fb_followers
and articles
par(mfrow=c(1,2))
hist(fbf$fb_followers, col="lightblue", freq=FALSE,
main = "Histogram of FB followers",
xlab = "FB followers")
hist(fbf$articles, col="pink", freq=FALSE,
main = "Histogram of news articles",
xlab = "News articles")
fb_followers
and articles
par(mfrow=c(1,2))
boxplot(fbf$fb_followers, col="lightblue", freq=FALSE,
main = "Boxplot of FB followers",
xlab = "FB followers")
boxplot(fbf$articles, col="pink", freq=FALSE,
main = "Boxplot of news articles",
xlab = "News articles")
The results from Steps 1-3 would seem to indicate that we may have issues with outliers. Before going any further, we should probably transform both of our variables and reassess the relationship of the transformed variables.
$log_fb_followers <- log(fbf$fb_followers, 10)
fbf$log_articles <- log(fbf$articles, 10) fbf
plot(log_fb_followers ~ log_articles, data=fbf,
main = "Scatter plot of FB followers and news articles (logged)",
xlab = "(log) News articles", ylab = "(log) FB followers",
col = "lightblue", pch=19)
par(mfrow=c(1,2))
hist(fbf$log_fb_followers, col="lightblue", freq=FALSE,
main = "Histogram of FB followers (logged)",
xlab = "(log) FB followers")
hist(fbf$log_articles, col="pink", freq=FALSE,
main = "Histogram of news articles (logged)",
xlab = "(log) News articles")
par(mfrow=c(1,2))
boxplot(fbf$log_fb_followers, col="lightblue", freq=FALSE,
main = "Boxplot of FB followers (logged)",
xlab = "(log) FB followers")
boxplot(fbf$log_articles, col="pink", freq=FALSE,
main = "Boxplot of news articles (logged)",
xlab = "(log) News articles")
Okay, so taking the decadal log of both fb_followers
and articles
seems to produce a scatter plot that is straight enough. While the scatter plot as well as the histograms and boxplots still seems indicate the presence of some outliers, they do not seem to be nearly as problematic as before. The resulting distributions also look somewhat normal.
Let’s now proceed to estimating a linear model.
fb_followers
using articles
?So lets see whether the news profile a candidate receives during the campaign period predicts the number of Facebook followers that they have accumulated by the end of the campaign period by running a linear model on the log transformed variables.
<- lm(log_fb_followers ~ log_articles, data = fbf)
lm.fb_followers lm.fb_followers
##
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = fbf)
##
## Coefficients:
## (Intercept) log_articles
## 3.2237 0.7241
What is the intercept? What is the slope? How would you interpret it in everyday language?
# The slope is 0.72 and the intercept is 3.2.
# If the decadal logarithm of the number of a news articles increases by 1, the
# decadal logarithm of the number of facebook followers they have increases by 0.72.
# Or, using the definition of the decadal logarithm:
# If the number of news articles increase by a factor of 10, the number of facebook
# followers increases by a factor of 5.3
# The latter uses
10^(0.7241) # = 5.2979
## [1] 5.297854
plot(log_fb_followers ~ log_articles, data = fbf, col="lightblue", pch=19,
main = "News profile and Facebook followers",
xlab = "(log) News articles",
ylab = "(log) Facebook followers",
xlim = range(fbf$log_articles, na.rm=TRUE),
ylim = range(fbf$log_fb_followers, na.rm=TRUE))
abline(lm.fb_followers, col="blue")
Is the relationship between news profiles and FB followers the same for first-time candidates and incumbents? Do incumbents seem to have an advantage in the race for followers and name recognition?
Every general election, the PAP retires approximately one-fourth to one third of incumbent PAP Members of Parliament (MPs) as part of the PAP’s self-renewal process and replaces them with first-time candidates. These first-time candidates represent the future of the party but (in most cases) are not yet known to the electorate and presumably do not yet enjoy widespread name recognition. By contrast, incumbent PAP MPs presumably enjoy greater name recognition at least within their own constituencies given a record of constituency service built up prior to the election.
One way that we might find out whether incumbents have an advantage in their number of followers, or if news coverage during the campaign is associated with greater increases in the number of followers for first-time candidates (as opposed to incumbents) is by subsetting the data, and comparing the intercepts and slopes.
incumbent
<-fbf[!fbf$incumbent,]
firsttimers <-fbf[fbf$incumbent,] incumbents
Lets estimate some linear models, inspect the output, and compare the slopes and intercepts:
<- lm(log_fb_followers ~ log_articles, data = firsttimers)
lm.firsttimers lm.firsttimers
##
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = firsttimers)
##
## Coefficients:
## (Intercept) log_articles
## 3.18075 0.03106
<- lm(log_fb_followers ~ log_articles, data = incumbents)
lm.incumbents lm.incumbents
##
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = incumbents)
##
## Coefficients:
## (Intercept) log_articles
## 3.5862 0.6343
plot(log_fb_followers ~ log_articles, data=firsttimers, col="lightblue", pch=19,
main = "News profiles and Facebook followers (logged)",
xlab = "(log) News articles",
ylab = "(log) Facebook followers",
xlim = range(fbf$log_articles, na.rm=TRUE),
ylim = range(fbf$log_fb_followers, na.rm=TRUE))
points(log_fb_followers ~ log_articles, data=incumbents, col="pink", pch=19)
abline(lm.firsttimers, col="blue")
abline(lm.incumbents, col="red")
We can further divide the incumbent candidates into two types: backbenchers and officeholders.
Backbenchers are incumbent MPs who, prior to the election, held no office within government and were just regular MPs. Officeholders are those who, prior to the dissolution of Parliament, held positions within government. Even though the campaign is held in between Parliaments, these incumbents tend to appear a lot at campagin events, on television, and give longer speeches than other incumbents.
officeholder
status<-incumbents[!incumbents$officeholder,] # Remember that officeholder is a logical vector
backbenchers <-incumbents[incumbents$officeholder,] officeholders
<- lm(log_fb_followers ~ log_articles, data = backbenchers)
lm.backbenchers lm.backbenchers
##
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = backbenchers)
##
## Coefficients:
## (Intercept) log_articles
## 3.9024 -0.1196
<- lm(log_fb_followers ~ log_articles, data = officeholders)
lm.officeholders lm.officeholders
##
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = officeholders)
##
## Coefficients:
## (Intercept) log_articles
## 3.8811 0.5956
plot(log_fb_followers ~ log_articles, data=firsttimers, col="lightblue", pch=19,
main = "News profiles and Facebook followers (logged)",
xlab = "(log) News articles",
ylab = "(log) Facebook followers",
xlim = range(fbf$log_articles, na.rm=TRUE),
ylim = range(fbf$log_fb_followers, na.rm=TRUE))
points(log_fb_followers ~ log_articles, data=backbenchers, col="pink", pch=19)
points(log_fb_followers ~ log_articles, data=officeholders, col="lightgreen", pch=19)
abline(lm.firsttimers, col="blue")
abline(lm.backbenchers, col="red")
abline(lm.officeholders, col="green")
It would seem that there is ultimately little evidence of an association between candidate news profiles and Facebook followers for first-time candidates and incumbent backbenchers. In contrast, there appears to be some evidence of an association between news profiles and Facebook followers for incumbent officeholders.
Well, let’s first think back to the lesson in the previous class regarding how correlation does not imply causation. As you likely remember, there are a number of things other than “A causes B” that can explain a correlation.
For example perhaps the news profiles as well as the number of Facebook followers for incumbent officeholders are both driven by a third, “lurking variable”. Indeed, the act of subsetting incumbent PAPs would seem to suggest that the perceived power or status of the MP within the party or government (such that they are appointed to government office) may drive not only Facebook followers but also candidate news profiles during the campaign period.
To see if the above explanation has face validity, you might check to see which PAP candidate received the most news profiles during the campaign as well as had the most Facebook followers. Do you think that this candidate got those Facebook followers because of the attention they received from the press during the campaign period? Or did they get those followers and the news profiles because they are, well, important before the campaign even started?