Introduction

Today, we will work with data collected during Singapore’s 2020 general election (GE2020). The objective of this activity is to draw on the R coding skills you acquired over the last few weeks, and to further practice using linear models to predict data. Therefore,

  • try to work on this activity independently,
  • if you get stuck, then feel free to ask for help of work with others

This activity includes more questions than you are expected to be able to answer during class. The rest of the questions can be used for extra practice at home. A .html document with solutions to all questions will be available at the end of the day so you will be able to check your answers.

Now, let’s start. Download fb_followers.csv from Canvas. After setting the correct working directory, import this dataset in R using the command read.csv().

fbf <- read.csv("fb_followers.csv")

Inspect the file and you will find that it contains five variables:

  • candidate: The names of the 93 PAP candidates nominated by the People’s Action Party in GE2020
  • fb_followers: A numeric variable that records the number of Facebook followers of their official Facebook page at midnight the night before polling day (00:00 July 10, 2020)
  • articles: A numeric variable that records the number of articles published by local newspapers between nomination day and polling day that mentioned the candidate by name
  • incumbent: A logical variable that indicates whether the candidate was an incumbent MP (i.e. they had served as an MP in the previous Parliament) or a first-time candidate
  • officeholder: A logical verctor that indicates whether the candidate was an officeholder (e.g. Cabinet Minister, Minister of State) in the previous Parliament

Challenge: Use fb_followers data to examine the relationship between the news profiles of PAP candidates during the campaign and the number of Facebook followers they accumulate

Is there an relationship between the news profiles of candidates during electoral campaigns (i.e. how often they appear in news media) and the number of followers they collect on social media platforms such as Facebook? If all press is good press and success in politics is at least partly contingent on name recognition (you cannot vote for who you don’t know), then the relationship between the news profile of candidates and their online followings seems worthy of investigation.

This exercise involves examining the relationship between local newspaper coverage of candidates running for the People’s Action Party (PAP) during Singapore’s recent GE2020 and the number of Facebook followers that the candidate had accumulated on their official Facebook page as of midnight the night before polling day on July 10, 2020. In particular, let’s see if we can use the number of newspaper articles that a candidate’s name appeared in (articles) to predict their Facebook followings (fb_followers).

Q1: Does the relationship meet the conditions for linear regression?

Before we attempt to run any linear regressions, let’s first visualise the data and check to see whether our variables meet the conditions for regression. To recall, these are:

  • The variables are quantitative (already know this)
  • No outliers
  • The relationship is straight enough

Step 1: Make a scatter plot of fb_followers by articles

Do you see any outliers? Does the relationship seem straight enough?

# your code here

Step 2: Make histograms of fb_followers and articles

# your code here

Step 3: Make boxplots of fb_followers and articles

# your code here

The results from Steps 1-3 would seem to indicate that we may have issues with outliers. Before going any further, we should probably transform both of our variables and reassess the relationship of the transformed variables.

Step 4: Transform variables using decadal log

# your code here

Step 5: Reassess the relationship of logged variables using scatter plot

# your code here

Step 6: Make histograms of logged variables

# your code here

Step 7: Make boxplots of logged variables

# your code here

Okay, so taking the decadal log of both fb_followers and articles seems to produce a scatter plot that is straight enough. While the scatter plot as well as the histograms and boxplots still seems indicate the presence of some outliers, they do not seem to be nearly as problematic as before. The resulting distributions also look somewhat normal.

Let’s now proceed to estimating a linear model.

Q2: Can you predict fb_followers using articles?

So lets see whether the news profile a candidate receives during the campaign period predicts the number of Facebook followers that they have accumulated by the end of the campaign period by running a linear model on the log transformed variables.

Step 8: Estimate the linear model

# your code here

Step 9: Interpret the coefficients

What is the intercept? What is the slope? How would you interpret it in everyday language?

# your answer here as a comment

Step 10: Create scatter plot with line of best fit

# your code here

Q3: Is the relationship the same for first-time candidates and incumbents?

Is the relationship between fb_followers and articles the same for first-time candidates and incumbents? Do incumbents seem to have an advantage in the race for followers and name recognition?

Every general election, the PAP retires approximately one-fourth to one third of incumbent PAP members of Parliament (MPs) as part of the PAP’s self-renewal process and replaces them with first-time candidates. These first-time candidates represent the future of the party. But in most cases, they are not yet known to the broader electorate and presumably do not yet enjoy widespread name recognition.

By contrast, incumbent PAP MPs presumably enjoy greater name recognition at least within their own constituencies given a record of constituency service built up in the years prior to the election. For example, think about who’s (usually smiling) face you will almost always see on People’s Association (PA) billboards around the neighborhood.

One way that we might find out whether incumbents have an advantage in their number of followers, or if news coverage during the campaign is associated with greater increases in the number of followers for first-time candidates (as opposed to incumbents) is by subsetting the data, and comparing the intercepts and slopes.

Step 11: Subset the data by incumbent

# your code here

Step 12: Estimate linear models on the subsets

Lets estimate some linear models, inspect the output, and compare the slopes and intercepts:

# your code here

Step 13: Plot scatter plots and lines of best fit for both subsets in one plot

# your code here

Q4: Is the relationship different for incumbent backbenchers vs. officeholders

We can further divide the incumbent candidates into two types: backbenchers and officeholders.

Backbenchers are incumbent MPs who, prior to the election, held no office within government and were just regular MPs. Officeholders are those who, prior to the dissolution of Parliament, held positions within government. Even though the campaign is held in between Parliaments, these incumbents tend to appear a lot at campaign events, on television, and give longer speeches than other incumbents.

Step 14: Subset incumbents by officeholder status

# your code here

Step 15: Estimate linear models on firsttimers, incumbent backbenchers, and incumbent officeholders

# your code here

Step 16: Plot scatterplots with lines of best fit for all three subsets in a single plot

# your code here

Conclusion

What did you find?

It would seem that there is ultimately little evidence of an association between candidate news profiles and Facebook followers for first-time candidates and incumbent backbenchers. In contrast, there appears to be some evidence of an association between news profiles and Facebook followers for incumbent officeholders.

What might explain these findings?

Well, let’s first think back to the lesson in the previous class regarding how correlation does not imply causation. As you likely remember, there are a number of things other than “A causes B” that can explain a correlation.

For example perhaps the news coverage as well as the number of Facebook followers for incumbent officeholders are both driven by a third, “lurking variable”. Indeed, the act of subsetting incumbent PAPs would seem to suggest that the perceived power or status of the MP within the party or government (such that they are appointed to government office) may drive not only Facebook followers but also candidate news coverage during the campaign period.

An additional check

To see if the above explanation has face validity, you might check to see which PAP candidate received the most news coverage during the campaign as well as had the most Facebook followers. Do you think that this candidate got those Facebook followers because of the attention they received from the press during the campaign period? Or did they get those followers and the news profiles because they are, well, important before the campaign even started?