- Visualization: good and bad graphs
- Break
- R exercise on gender discrimination
- Discussion + suggested solutions
08/19/2021
Each team has 5 mins to discuss pros and cons of assigned graph (can be found on the canvas):
Poor numerical reasoning and misleading presentations are common problems
Always consider:
What does it mean, “is there discrimination?”
Suppose overall acceptance rate is 30%
50 men apply
30 women apply
Suppose overall acceptance rate is 30%
50 men apply 15 are accepted
30 women apply 9 are accepted
Is this evidence of discrimination?
Suppose overall acceptance rate is 30%
50 men apply 23 are accepted
30 women apply 1 are accepted
Is this evidence of discrimination?
Suppose overall acceptance rate is 30%
50 men apply 18 are accepted
30 women apply 6 are accepted
Is this evidence of discrimination?
Where is the threshold for us to say there is discrimination happening?
Question #1: What was the overall acceptance rate for male vs. female applicants?
Question #2: Is there a significant difference or not?
Question #3: What are the male and female admission rates for each department?
Question #4: Is there evidence for sex-based discrimination? Is there evidence against discrimination?
x <- read.csv("AdmissionsData.csv") dim(x)
## [1] 4425 3
head(x)
## Department Sex Admitted ## 1 Biology M Yes ## 2 Biology M Yes ## 3 Biology M Yes ## 4 Biology M Yes ## 5 Biology M Yes ## 6 Biology M Yes
table(x$Department)
## ## Biology English History Mathematics Philosophy Psychology ## 613 584 792 933 585 918
table(x$Department, x$Sex)
## ## F M ## Biology 341 272 ## English 393 191 ## History 375 417 ## Mathematics 108 825 ## Philosophy 25 560 ## Psychology 593 325
Female <- x[x$Sex == "F", ] Male <- x[x$Sex == "M", ] nrow(Female[Female$Admitted == "Yes", ]) / nrow(Female)
## [1] 0.3035422
nrow(Male[Male$Admitted == "Yes", ]) / nrow(Male)
## [1] 0.4602317
sum(x$Sex == "M" & x$Admitted == "Yes") / sum(x$Sex == "M")
## [1] 0.4602317
sum(x$Sex == "F" & x$Admitted == "Yes") / sum(x$Sex == "F")
## [1] 0.3035422
table(x$Sex, x$Admitted)
## ## No Yes ## F 1278 557 ## M 1398 1192
1192 / (1398 + 1192)
## [1] 0.4602317
557 / (557 + 1278)
## [1] 0.3035422
Is/are there particular department(s) that seems especially problematic in terms of differential admission rates?
What are the male and female admission rates for each department?
Demo the admission rates by sex in Biology.
Bio <- x[x$Department == "Biology", ] rate.M <- sum(Bio$Sex == "M" & Bio$Admitted == "Yes") / sum(Bio$Sex == "M") rate.F <- sum(Bio$Sex == "F" & Bio$Admitted == "Yes") / sum(Bio$Sex == "F") rate.M
## [1] 0.05882353
rate.F
## [1] 0.07038123
Acceptance rate by sex:
Team 1: English
Team 2: History
Team 3: Psychology
You have 5 min
table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",]$Department)
## ## Biology English History Mathematics Philosophy Psychology ## F 24 94 131 89 17 202 ## M 16 53 139 511 353 120
table(x$Sex, x$Department)
## ## Biology English History Mathematics Philosophy Psychology ## F 341 393 375 108 25 593 ## M 272 191 417 825 560 325
table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",]$Department)/ table(x$Sex, x$Department)
## ## Biology English History Mathematics Philosophy Psychology ## F 0.07038123 0.23918575 0.34933333 0.82407407 0.68000000 0.34064081 ## M 0.05882353 0.27748691 0.33333333 0.61939394 0.63035714 0.36923077
barplot(table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",] $Department)/table(x$Sex, x$Department), beside=TRUE)
barplot(table(x[x$Admitted == "Yes",]$Sex, x[x$Admitted == "Yes",] $Department)/table(x$Sex, x$Department), beside=TRUE, legend=TRUE)
barplot(table(x$Admitted, x$Department), legend=TRUE)
barplot(table(x$Sex, x$Department), legend=TRUE)