Today, we will work with data collected during Singapore’s 2020 general election (GE2020). The objective of this activity is to draw on the R coding skills you acquired over the last few weeks, and to further practice using linear models to predict data. Therefore,
This activity includes more questions than you are expected to be able to answer during class. The rest of the questions can be used for extra practice at home. A .html document with solutions to all questions will be available at the end of the day so you will be able to check your answers.
Now, let’s start. Download fb_followers.csv
from Canvas. After setting the correct working directory, import this dataset in R using the command read.csv()
.
<- read.csv("fb_followers.csv") fbf
Inspect the file and you will find that it contains five variables:
candidate
: The names of the 93 PAP candidates nominated by the People’s Action Party in GE2020fb_followers
: A numeric variable that records the number of Facebook followers of their official Facebook page at midnight the night before polling day (00:00 July 10, 2020)articles
: A numeric variable that records the number of articles published by local newspapers between nomination day and polling day that mentioned the candidate by nameincumbent
: A logical variable that indicates whether the candidate was an incumbent MP (i.e. they had served as an MP in the previous Parliament) or a first-time candidateofficeholder
: A logical verctor that indicates whether the candidate was an officeholder (e.g. Cabinet Minister, Minister of State) in the previous Parliamentfb_followers
data to examine the relationship between the news profiles of PAP candidates during the campaign and the number of Facebook followers they accumulateIs there an relationship between the news profiles of candidates during electoral campaigns (i.e. how often they appear in news media) and the number of followers they collect on social media platforms such as Facebook? If all press is good press and success in politics is at least partly contingent on name recognition (you cannot vote for who you don’t know), then the relationship between the news profile of candidates and their online followings seems worthy of investigation.
This exercise involves examining the relationship between local newspaper coverage of candidates running for the People’s Action Party (PAP) during Singapore’s recent GE2020 and the number of Facebook followers that the candidate had accumulated on their official Facebook page as of midnight the night before polling day on July 10, 2020. In particular, let’s see if we can use the number of newspaper articles that a candidate’s name appeared in (articles
) to predict their Facebook followings (fb_followers
).
Before we attempt to run any linear regressions, let’s first visualise the data and check to see whether our variables meet the conditions for regression. To recall, these are:
fb_followers
by articles
Do you see any outliers? Does the relationship seem straight enough?
# your code here
fb_followers
and articles
# your code here
fb_followers
and articles
# your code here
The results from Steps 1-3 would seem to indicate that we may have issues with outliers. Before going any further, we should probably transform both of our variables and reassess the relationship of the transformed variables.
# your code here
# your code here
# your code here
# your code here
Okay, so taking the decadal log of both fb_followers
and articles
seems to produce a scatter plot that is straight enough. While the scatter plot as well as the histograms and boxplots still seems indicate the presence of some outliers, they do not seem to be nearly as problematic as before. The resulting distributions also look somewhat normal.
Let’s now proceed to estimating a linear model.
fb_followers
using articles
?So lets see whether the news profile a candidate receives during the campaign period predicts the number of Facebook followers that they have accumulated by the end of the campaign period by running a linear model on the log transformed variables.
# your code here
What is the intercept? What is the slope? How would you interpret it in everyday language?
# your answer here as a comment
# your code here
Is the relationship between fb_followers
and articles
the same for first-time candidates and incumbents? Do incumbents seem to have an advantage in the race for followers and name recognition?
Every general election, the PAP retires approximately one-fourth to one third of incumbent PAP members of Parliament (MPs) as part of the PAP’s self-renewal process and replaces them with first-time candidates. These first-time candidates represent the future of the party. But in most cases, they are not yet known to the broader electorate and presumably do not yet enjoy widespread name recognition.
By contrast, incumbent PAP MPs presumably enjoy greater name recognition at least within their own constituencies given a record of constituency service built up in the years prior to the election. For example, think about who’s (usually smiling) face you will almost always see on People’s Association (PA) billboards around the neighborhood.
One way that we might find out whether incumbents have an advantage in their number of followers, or if news coverage during the campaign is associated with greater increases in the number of followers for first-time candidates (as opposed to incumbents) is by subsetting the data, and comparing the intercepts and slopes.
incumbent
# your code here
Lets estimate some linear models, inspect the output, and compare the slopes and intercepts:
# your code here
# your code here
We can further divide the incumbent candidates into two types: backbenchers and officeholders.
Backbenchers are incumbent MPs who, prior to the election, held no office within government and were just regular MPs. Officeholders are those who, prior to the dissolution of Parliament, held positions within government. Even though the campaign is held in between Parliaments, these incumbents tend to appear a lot at campaign events, on television, and give longer speeches than other incumbents.
officeholder
status# your code here
# your code here
# your code here
It would seem that there is ultimately little evidence of an association between candidate news profiles and Facebook followers for first-time candidates and incumbent backbenchers. In contrast, there appears to be some evidence of an association between news profiles and Facebook followers for incumbent officeholders.
Well, let’s first think back to the lesson in the previous class regarding how correlation does not imply causation. As you likely remember, there are a number of things other than “A causes B” that can explain a correlation.
For example perhaps the news coverage as well as the number of Facebook followers for incumbent officeholders are both driven by a third, “lurking variable”. Indeed, the act of subsetting incumbent PAPs would seem to suggest that the perceived power or status of the MP within the party or government (such that they are appointed to government office) may drive not only Facebook followers but also candidate news coverage during the campaign period.
To see if the above explanation has face validity, you might check to see which PAP candidate received the most news coverage during the campaign as well as had the most Facebook followers. Do you think that this candidate got those Facebook followers because of the attention they received from the press during the campaign period? Or did they get those followers and the news profiles because they are, well, important before the campaign even started?