Read in the UESI2019_with_indicators.csv
dataset as the data frame uesi
and perform requisite checks on data structure
2021-08-26
Read in the UESI2019_with_indicators.csv
dataset as the data frame uesi
and perform requisite checks on data structure
Here’s more detail on some indicators of the UESI and their units. You can explore the data to learn about the cities (population, area, etc.), and environmental performance: air quality, urban heat island, tree cover, & public transport.
What do the values indicate?
Higher values for indicators like air pollution
PM25_mean
NO2_mean
UHI_mean
are bad. Others, like
TREECAP_mean
TRANSCOV_mean
have higher values on the good end.
What’s the difference between the _mean
and .UESI
values?
.UESI
data are all scaled from 0 to 100..UESI data
all have 100 being good.Why might this be useful?
Pick one indicator and create a table to explore how many missing values there are for that indicator.
Recall that str
(structure) is one way to get a list of the column names.
Pick one indicator and create a table to explore how many missing values there are for that indicator.
table(is.na(uesi$PUBTRANS_mean))
## ## FALSE TRUE ## 162 2
uesi$city[is.na(uesi$PUBTRANS_mean)]
## [1] "evansville" "reykjavik"
Answer these questions:
popdens
) of cities that score 100 on Tree Cover per capita?1. What is the mean population density of cities that score 100 on Tree Cover per capita?
mean(uesi$popdens[uesi$TREECAP.UESI == 100], na.rm=TRUE)
## [1] 2051.575
2. Median?
median(uesi$popdens[uesi$TREECAP.UESI == 100], na.rm=TRUE)
## [1] 1363.773
3. IQR
IQR(uesi$popdens, na.rm=TRUE)
## [1] 4796.313
EXTRA: you can also use quantile()
q1 <- quantile(uesi$popdens, 0.25, na.rm=TRUE) q3 <- quantile(uesi$popdens, 0.75, na.rm=TRUE) q3-q1
## 75% ## 4796.313
4. What is the standard deviation of all cities’ population density?
sd(uesi$popdens, na.rm=TRUE)
## [1] 4830.647
Review of preparatory work.
PUBTRANS.UESI
and TREECAP.UESI
1. What is the total population of cities NOT in Asia?
sum(uesi$population_total[uesi$continent != "Asia"])
## [1] 255690791
2. How many cities have more than 100 neighborhoods?
length(uesi$city[uesi$nbhd_num > 100])
## [1] 42
3. Which cities have scores above 85 on both PUBTRANS.UESI
and TREECAP.UESI
# Why use which( ) in this code? Try omitting it and spot the difference! uesi$city[which(uesi$PUBTRANS.UESI > 85 & uesi$TREECAP.UESI > 85)]
## [1] "alexandria" "alger" "amsterdam" "asuncion" ## [5] "atlanta" "baltimore" "berlin" "boston" ## [9] "bratislava" "bridgeport" "brisbane" "brussels" ## [13] "bucharest" "budapest" "chelyabinsk" "chicago" ## [17] "cleveland" "copenhagen" "denver" "detroit" ## [21] "dublin" "edinburgh" "fargo" "hamburg" ## [25] "houston" "kampala" "kiev" "lome" ## [29] "london" "louisville" "lyons" "managua" ## [33] "maputo" "melbourne" "milan" "milwaukee" ## [37] "minneapolis" "monrovia" "montreal" "moscow" ## [41] "munich" "nashville" "newyork" "nizhny" ## [45] "novosibirsk" "omaha" "oslo" "paterson" ## [49] "philadelphia" "portland" "quito" "riodejaneiro" ## [53] "saintpetersburg" "saltlakecity" "sanjose" "seattle" ## [57] "seoul" "singapore" "stlouis" "stockholm" ## [61] "sydney" "toronto" "tulsa" "vancouver" ## [65] "vienna" "warsaw" "wellington" "wichita" ## [69] "yangon" "zagreb" "zurich"
How many cities, get perfect scores of 100 on all of the three variables PM25.UESI
, UHI.UESI
, and TREECAP.UESI
?
How many cities, get perfect scores of 100 on all of the three variables PM25.UESI
, UHI.UESI
, and TREECAP.UESI
?
table(uesi$PM25.UESI==100 & uesi$UHI.UESI==100 & uesi$TREECAP.UESI==100)
## ## FALSE TRUE ## 162 2
# looks like only 2 cities meet this condition! sum(uesi$PM25.UESI==100 & uesi$UHI.UESI==100 & uesi$TREECAP.UESI==100)
## [1] 2
Which two cities are they? For this subsetting you may want to wrap your logical expression in the function which( )
. See R Tutorial 19 for a reminder of why.
uesi$city[which(uesi$PM25.UESI==100 & uesi$UHI.UESI==100 & uesi$TREECAP.UESI==100)]
## [1] "anchorage" "oslo"
TREECAP
)?TREECAP
) and PM2.5 (PM25
)?TREECAP
) OR PM2.5 (PM25
)?1. How many cities are better than Singapore with respect to treecover per capita (TREECAP
)?
sum(uesi$TREECAP.UESI > uesi$TREECAP.UESI[uesi$city == "singapore"])
## [1] 88
2. How many cities are better than Singapore with respect to BOTH tree cover (TREECAP
) and PM2.5 (PM25
)?
sum((uesi$TREECAP.UESI > uesi$TREECAP.UESI[uesi$city == "singapore"]) & (uesi$PM25.UESI > uesi$PM25.UESI[uesi$city == "singapore"]) )
## [1] 82
3. How many cities are better than Singapore with respect to EITHER treecover per capita (TREECAP
) OR PM2.5 (PM25
)?
sum((uesi$TREECAP.UESI > uesi$TREECAP.UESI[uesi$city == "singapore"]) | (uesi$PM25.UESI > uesi$PM25.UESI[uesi$city == "singapore"]) )
## [1] 145
In Challenge 6.3, it looks like a lot of cities fell into this category. Singapore’s PM2.5 rating is not so great; uesi$PM25.UESI[uesi$city == "singapore"]
returns 15.1
. Draw a histogram of PM2.5 for all cities.
In Challenge 6.3, it looks like a lot of cities fell into this category. Singapore’s PM2.5 rating is not so great; uesi$PM25.UESI[uesi$city == "singapore"]
returns 15.1
. Draw a histogram of PM2.5 for all cities.
hist(uesi$PM25.UESI, xlab="Performance on PM25", col="lightblue")
How do cities across continents compare on the PM2.5 UESI indicator? Save your output as a PDF using ‘Export > Save as PDF’
How do cities across continents compare on the PM2.5 UESI indicator? Save your output as a PDF using ‘Export > Save as PDF’
boxplot(uesi$PM25.UESI ~ uesi$continent, col="lightblue", xlab="Score", main="City performance on Air Quality PM2.5 by Continent")
Asia is not doing well on this comparison. But we might expect larger cities to have worse PM2.5. Check the distribution of city total populations by continent.
Asia is not doing well on this comparison. But we might expect larger cities to have worse PM2.5. Check the distribution of city total populations by continent.
boxplot(uesi$population_total ~ uesi$continent, main="City populations by continent", col="pink", xlab="Score")
Is that enough to explain the discrepancy? We will have to wait for a later class to examine this question.