Introduction

Today, we will work with data collected during Singapore’s 2020 general election (GE2020). The objective of this activity is to draw on the R coding skills you acquired over the last few weeks, and to further practice using linear models to predict data. Therefore,

  • try to work on this activity independently,
  • if you get stuck, then feel free to ask for help of work with others

This activity includes more questions than you are expected to be able to answer during class. The rest of the questions can be used for extra practice at home. A .html document with solutions to all questions will be available at the end of the day so you will be able to check your answers.

Now, let’s start. Download fb_followers.csv from Canvas. After setting the correct working directory, import this dataset in R using the command read.csv().

fbf <- read.csv("fb_followers.csv")

Inspect the file and you will find that it contains five variables:

  • candidate: The names of the 93 PAP candidates nominated by the People’s Action Party in GE2020
  • fb_followers: A numeric variable that records the number of Facebook followers of their official Facebook page at midnight the night before polling day (00:00 July 10, 2020)
  • articles: A numeric variable that records the number of articles published by local newspapers between nomination day and polling day that mentioned the candidate by name
  • incumbent: A logical variable that indicates whether the candidate was an incumbent MP (i.e. they had served as an MP in the previous Parliament) or a first-time candidate
  • officeholder: A logical verctor that indicates whether the candidate was an officeholder (e.g. Cabinet Minister, Minister of State) in the previous Parliament

Challenge: Use fb_followers data to examine the relationship between the news profiles of PAP candidates during the campaign and the number of Facebook followers they accumulate

Is there an relationship between the news profiles of candidates during electoral campaigns (i.e. how often they appear in news media) and the number of followers they collect on social media platforms such as Facebook? If all press is good press and success in politics is at least partly contingent on name recognition (you cannot vote for who you don’t know), then the relationship between the news profile of candidates and their online followings seems worthy of investigation.

This exercise involves examining the relationship between local newspaper coverage of candidates running for the People’s Action Party (PAP) during Singapore’s recent GE2020 and the number of Facebook followers that the candidate had accumulated on their official Facebook page as of midnight the night before polling day on July 10, 2020. In particular, let’s see if we can use the number of newspaper articles that a candidate’s name appeared in (articles) to predict their Facebook followings (fb_followers).

Q1: Does the relationship meet the conditions for linear regression?

Before we attempt to run any linear regressions, let’s first visualise the data and check to see whether our variables meet the conditions for regression. To recall, these are:

  • The variables are quantitative (already know this)
  • No outliers
  • The relationship is straight enough

Step 1: Make a scatter plot of fb_followers by articles

Do you see any outliers? Does the relationship seem straight enough?

plot(fb_followers ~ articles, data=fbf, 
     main = "Scatter plot of FB followers and news articles",
     xlab = "News articles", ylab = "FB followers",
     col="lightblue", pch=19)

Step 2: Make Histograms of fb_followers and articles

par(mfrow=c(1,2))
hist(fbf$fb_followers, col="lightblue", freq=FALSE,
     main = "Histogram of FB followers",
     xlab = "FB followers")
hist(fbf$articles, col="pink", freq=FALSE,
     main = "Histogram of news articles",
     xlab = "News articles")

Step 3: Make Boxplots of fb_followers and articles

par(mfrow=c(1,2))
boxplot(fbf$fb_followers, col="lightblue", freq=FALSE,
     main = "Boxplot of FB followers",
     xlab = "FB followers")
boxplot(fbf$articles, col="pink", freq=FALSE,
     main = "Boxplot of news articles",
     xlab = "News articles")

The results from Steps 1-3 would seem to indicate that we may have issues with outliers. Before going any further, we should probably transform both of our variables and reassess the relationship of the transformed variables.

Step 4: Transform variables using decadal log

fbf$log_fb_followers <- log(fbf$fb_followers, 10)
fbf$log_articles     <- log(fbf$articles, 10)

Step 5: Reassess the relationship of logged variables using scatter plot

plot(log_fb_followers ~ log_articles, data=fbf,
     main = "Scatter plot of FB followers and news articles (logged)",
     xlab = "(log) News articles", ylab = "(log) FB followers",
     col = "lightblue", pch=19)

Step 6: Make Histograms of logged variables

par(mfrow=c(1,2))
hist(fbf$log_fb_followers, col="lightblue", freq=FALSE,
     main = "Histogram of FB followers (logged)",
     xlab = "(log) FB followers")
hist(fbf$log_articles, col="pink", freq=FALSE,
     main = "Histogram of news articles (logged)",
     xlab = "(log) News articles")

Step 7: Make boxplots of logged variables

par(mfrow=c(1,2))
boxplot(fbf$log_fb_followers, col="lightblue", freq=FALSE,
     main = "Boxplot of FB followers (logged)",
     xlab = "(log) FB followers")
boxplot(fbf$log_articles, col="pink", freq=FALSE,
     main = "Boxplot of news articles (logged)",
     xlab = "(log) News articles")

Okay, so taking the decadal log of both fb_followers and articles seems to produce a scatter plot that is straight enough. While the scatter plot as well as the histograms and boxplots still seems indicate the presence of some outliers, they do not seem to be nearly as problematic as before. The resulting distributions also look somewhat normal.

Let’s now proceed to estimating a linear model.

Q2: Can you predict fb_followers using articles?

So lets see whether the news profile a candidate receives during the campaign period predicts the number of Facebook followers that they have accumulated by the end of the campaign period by running a linear model on the log transformed variables.

Step 8: Estimate the Linear Model

lm.fb_followers <- lm(log_fb_followers ~ log_articles, data = fbf)
lm.fb_followers
## 
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = fbf)
## 
## Coefficients:
##  (Intercept)  log_articles  
##       3.2237        0.7241

Step 9: Interpret the Coefficients

What is the intercept? What is the slope? How would you interpret it in everyday language?

# The slope is 0.72 and the intercept is 3.2.
# If the decadal logarithm of the number of a news articles increases by 1, the
# decadal logarithm of the number of facebook followers they have increases by 0.72.

# Or, using the definition of the decadal logarithm:
# If the number of news articles increase by a factor of 10, the number of facebook
# followers increases by a factor of 5.3

# The latter uses
10^(0.7241) # = 5.2979
## [1] 5.297854

Step 10: Create scatter plot with line of best fit

plot(log_fb_followers ~ log_articles, data = fbf, col="lightblue", pch=19,
     main = "News profile and Facebook followers",
     xlab = "(log) News articles",
     ylab = "(log) Facebook followers",
     xlim = range(fbf$log_articles, na.rm=TRUE), 
     ylim = range(fbf$log_fb_followers, na.rm=TRUE))
abline(lm.fb_followers, col="blue")

Q3: Is the relationship the same for first-time candidates and incumbents?

Is the relationship between news profiles and FB followers the same for first-time candidates and incumbents? Do incumbents seem to have an advantage in the race for followers and name recognition?

Every general election, the PAP retires approximately one-fourth to one third of incumbent PAP Members of Parliament (MPs) as part of the PAP’s self-renewal process and replaces them with first-time candidates. These first-time candidates represent the future of the party but (in most cases) are not yet known to the electorate and presumably do not yet enjoy widespread name recognition. By contrast, incumbent PAP MPs presumably enjoy greater name recognition at least within their own constituencies given a record of constituency service built up prior to the election.

One way that we might find out whether incumbents have an advantage in their number of followers, or if news coverage during the campaign is associated with greater increases in the number of followers for first-time candidates (as opposed to incumbents) is by subsetting the data, and comparing the intercepts and slopes.

Step 11: Subset the data by incumbent

firsttimers <-fbf[!fbf$incumbent,]
incumbents  <-fbf[fbf$incumbent,]

Step 12: Estimate linear models on the subsets

Lets estimate some linear models, inspect the output, and compare the slopes and intercepts:

lm.firsttimers <- lm(log_fb_followers ~ log_articles, data = firsttimers)
lm.firsttimers
## 
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = firsttimers)
## 
## Coefficients:
##  (Intercept)  log_articles  
##      3.18075       0.03106
lm.incumbents  <- lm(log_fb_followers ~ log_articles, data = incumbents)
lm.incumbents
## 
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = incumbents)
## 
## Coefficients:
##  (Intercept)  log_articles  
##       3.5862        0.6343

Step 13: Plot scatter plots and lines of best fit for both subsets in one plot

plot(log_fb_followers ~ log_articles, data=firsttimers, col="lightblue", pch=19,
     main = "News profiles and Facebook followers (logged)",
     xlab = "(log) News articles",
     ylab = "(log) Facebook followers",
     xlim = range(fbf$log_articles, na.rm=TRUE), 
     ylim = range(fbf$log_fb_followers, na.rm=TRUE))
points(log_fb_followers ~ log_articles, data=incumbents, col="pink", pch=19)
abline(lm.firsttimers, col="blue")
abline(lm.incumbents,  col="red")

Q4: Is the relationship different for incumbent backbenchers vs. officeholders

We can further divide the incumbent candidates into two types: backbenchers and officeholders.

Backbenchers are incumbent MPs who, prior to the election, held no office within government and were just regular MPs. Officeholders are those who, prior to the dissolution of Parliament, held positions within government. Even though the campaign is held in between Parliaments, these incumbents tend to appear a lot at campagin events, on television, and give longer speeches than other incumbents.

Step 14: Subset incumbents by officeholder status

backbenchers  <-incumbents[!incumbents$officeholder,] # Remember that officeholder is a logical vector
officeholders <-incumbents[incumbents$officeholder,]

Step 15: Estimate linear models on firsttimers, incumbent backbenchers, and incumbent officeholders

lm.backbenchers  <- lm(log_fb_followers ~ log_articles, data = backbenchers)
lm.backbenchers
## 
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = backbenchers)
## 
## Coefficients:
##  (Intercept)  log_articles  
##       3.9024       -0.1196
lm.officeholders <- lm(log_fb_followers ~ log_articles, data = officeholders)
lm.officeholders
## 
## Call:
## lm(formula = log_fb_followers ~ log_articles, data = officeholders)
## 
## Coefficients:
##  (Intercept)  log_articles  
##       3.8811        0.5956

Step 16: Plot scatterplots with lines of best fit for all three subsets in a single plot

plot(log_fb_followers ~ log_articles, data=firsttimers, col="lightblue", pch=19,
     main = "News profiles and Facebook followers (logged)",
     xlab = "(log) News articles",
     ylab = "(log) Facebook followers",
     xlim = range(fbf$log_articles, na.rm=TRUE), 
     ylim = range(fbf$log_fb_followers, na.rm=TRUE))
points(log_fb_followers ~ log_articles, data=backbenchers, col="pink", pch=19)
points(log_fb_followers ~ log_articles, data=officeholders, col="lightgreen", pch=19)
abline(lm.firsttimers, col="blue")
abline(lm.backbenchers, col="red")
abline(lm.officeholders, col="green")

Conclusion

What did you find?

It would seem that there is ultimately little evidence of an association between candidate news profiles and Facebook followers for first-time candidates and incumbent backbenchers. In contrast, there appears to be some evidence of an association between news profiles and Facebook followers for incumbent officeholders.

What might explain these findings?

Well, let’s first think back to the lesson in the previous class regarding how correlation does not imply causation. As you likely remember, there are a number of things other than “A causes B” that can explain a correlation.

For example perhaps the news profiles as well as the number of Facebook followers for incumbent officeholders are both driven by a third, “lurking variable”. Indeed, the act of subsetting incumbent PAPs would seem to suggest that the perceived power or status of the MP within the party or government (such that they are appointed to government office) may drive not only Facebook followers but also candidate news profiles during the campaign period.

An additional check

To see if the above explanation has face validity, you might check to see which PAP candidate received the most news profiles during the campaign as well as had the most Facebook followers. Do you think that this candidate got those Facebook followers because of the attention they received from the press during the campaign period? Or did they get those followers and the news profiles because they are, well, important before the campaign even started?