08/19/2021

Today

  • Visualization: good and bad graphs
  • Break
  • R exercise on gender discrimination
  • Discussion + suggested solutions


Team Acitivity

Each team has 5 mins to discuss pros and cons of assigned graph (can be found on the canvas):

  • Team 1: work_life.png
  • Team 2: military_spending.png
  • Team 3: titanic_casualties.png

Remark

Poor numerical reasoning and misleading presentations are common problems

Always consider:

  • Is the evidence misrepresented?
  • Was there an agenda being pursued?

Activity 2: Discrimination in Admissions

  • What does it mean, “is there discrimination?”

    Suppose overall acceptance rate is 30%

  • 50 men apply

  • 30 women apply

What does it mean, “is there discrimination?”




Suppose overall acceptance rate is 30%

  • 50 men apply 15 are accepted

  • 30 women apply 9 are accepted

    Is this evidence of discrimination?

What does it mean, “is there discrimination?”




Suppose overall acceptance rate is 30%

  • 50 men apply 23 are accepted

  • 30 women apply 1 are accepted

    Is this evidence of discrimination?

What does it mean, “is there discrimination?”




Suppose overall acceptance rate is 30%

  • 50 men apply 18 are accepted

  • 30 women apply 6 are accepted

    Is this evidence of discrimination?

    Where is the threshold for us to say there is discrimination happening?

R exercise

Question #1: What was the overall acceptance rate for male vs. female applicants?

Question #2: Is there a significant difference or not?

Question #3: What are the male and female admission rates for each department?

Question #4: Is there evidence for sex-based discrimination? Is there evidence against discrimination?

A quick overview of the dataset

x <- read.csv("AdmissionsData.csv")
dim(x)
## [1] 4425    3
head(x)
##   Department Sex Admitted
## 1    Biology   M      Yes
## 2    Biology   M      Yes
## 3    Biology   M      Yes
## 4    Biology   M      Yes
## 5    Biology   M      Yes
## 6    Biology   M      Yes

Applicants by sex and deparments

table(x$Department)
## 
##     Biology     English     History Mathematics  Philosophy  Psychology 
##         613         584         792         933         585         918
table(x$Department, x$Sex)
##              
##                 F   M
##   Biology     341 272
##   English     393 191
##   History     375 417
##   Mathematics 108 825
##   Philosophy   25 560
##   Psychology  593 325

R exercise 1

  • What was the overall acceptance rate for men vs. women?
  • You have 5 min to try out

R exercise 1 - solution (subsetting)

Female <- x[x$Sex == "F", ] 
Male <- x[x$Sex == "M", ]
nrow(Female[Female$Admitted == "Yes", ]) / nrow(Female)
## [1] 0.3035422
nrow(Male[Male$Admitted == "Yes", ]) / nrow(Male)
## [1] 0.4602317

R exercise 1 - solution (without subsetting)

sum(x$Sex == "M" & x$Admitted == "Yes") / sum(x$Sex == "M") 
## [1] 0.4602317
sum(x$Sex == "F" & x$Admitted == "Yes") / sum(x$Sex == "F")
## [1] 0.3035422

R exercise 1 - solution (using table)

table(x$Sex, x$Admitted)
##    
##       No  Yes
##   F 1278  557
##   M 1398 1192
1192 / (1398 + 1192)
## [1] 0.4602317
557 / (557 + 1278)
## [1] 0.3035422

R exercise 2

  • Is there a significant difference or not? What does it mean by being “significant”?
  • Think for a min

R exercise 2 - solution

  • Is there a significant difference or not? What does it mean by being “significant”?
  • Difference seems large but how “reliable” this evidence can convince us there is a systematic difference.
  • Significance is a concept that is outside the scope of this lesson (light touch on it here).
  • The idea behind is the following: hard to judge since there is so much uncertainty in the admission. However, if the uncertainty is very unlikely to generate a pattern that we get, then “most likely” a systematic difference exists.

R exercise 3

  • Is/are there particular department(s) that seems especially problematic in terms of differential admission rates?

  • What are the male and female admission rates for each department?

  • Demo the admission rates by sex in Biology.

R exercise 3 - solution

Bio <- x[x$Department == "Biology", ] 
rate.M <- sum(Bio$Sex == "M" & Bio$Admitted == "Yes") / sum(Bio$Sex == "M")
rate.F <- sum(Bio$Sex == "F" & Bio$Admitted == "Yes") / sum(Bio$Sex == "F")
rate.M
## [1] 0.05882353
rate.F
## [1] 0.07038123

R exercise 3 - Team activity

Acceptance rate by sex:

  • Team 1: English

  • Team 2: History

  • Team 3: Psychology

  • You have 5 min

R exercise 3 - solution (only table) 1

table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",]$Department)
##    
##     Biology English History Mathematics Philosophy Psychology
##   F      24      94     131          89         17        202
##   M      16      53     139         511        353        120
table(x$Sex, x$Department)
##    
##     Biology English History Mathematics Philosophy Psychology
##   F     341     393     375         108         25        593
##   M     272     191     417         825        560        325

R exercise 3 - solution (only table) 2

table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",]$Department)/
         table(x$Sex, x$Department)
##    
##        Biology    English    History Mathematics Philosophy Psychology
##   F 0.07038123 0.23918575 0.34933333  0.82407407 0.68000000 0.34064081
##   M 0.05882353 0.27748691 0.33333333  0.61939394 0.63035714 0.36923077

R exercise 4

  • Is there evidence for sex-based discrimination? Is there evidence against discrimination?
  • Visualizing the data should help.
  • Which plot do you think is appropriate?

R exercise 4 - solution

barplot(table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",]
    $Department)/table(x$Sex, x$Department), beside=TRUE)

Adding legend

barplot(table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",]
    $Department)/table(x$Sex, x$Department), beside=TRUE, legend=TRUE)

R exercise 4 - puzzle??

  • The admission rate by sex in every department seems similar.
  • Why this finding contradicts our previous observation?
  • Think for two mins

R exercise 4 - Perhaps

  • Every departments have different admission rates
  • Every departments have different number of applicants by gender.
  • Examing the overall addmission rate by gender may overlook these underlying differences.

More competitive in Biology than Math

barplot(table(x$Admitted, x$Department), legend=TRUE)

More female apply to competitive departments

barplot(table(x$Sex, x$Department), legend=TRUE)

Simpsons’ Paradox

  • Some departments are harder to be admitted.
  • More female applicants apply to competitive departments.
  • Fewer female applicants are admitted.
  • As a result, it seems men are more likely to be admitted.

Remarks

  • Overall statistics give us a picture
    • but we need to look more closely
  • Women apply more to competitive departments, so overall it looks like there is more discrimination.
  • We do not find evidence of discrimination once we look department by department
    • Simpson’s Paradox

Remarks

  • This does not mean there is no gender discrimination
  • There’s ample evidence of gender-biased discrimination in many countries, fields and industries - just not in this one
  • No evidence against discrimination != no discrimination
  • What factors might have resulted in women applying in such different ways to different departments?
  • Next class: displaying and summarizing data