[Getting and Cleaning data] Quiz 3

20 篇文章 1 订阅
16 篇文章 0 订阅

More details can be found in the html file here.

Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here. And load the data into R. The code book, describing the variable names is here.

Create a logical vector that identifies the households on greater than 10 acres who sold more than $10,000 worth of agriculture products. Assign that logical vector to the variable agricultureLogical. Apply the which() function like this to identify the rows of the data frame where the logical vector is TRUE.

which(agricultureLogical)

What are the first 3 values that result?

  • 59, 460, 474

  • 125, 238,262

  • 403, 756, 798

  • 25, 36, 45

# download data
if(!file.exists("./data")) dir.create("./data")
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(fileUrl, destfile = "./data/ACS.csv")
# load data into R
acs <- read.csv("./data/ACS.csv")
agricultureLogical <- (acs$ACR==3 & acs$AGS == 6)
which(agricultureLogical)[1:3]

Question 2

Using the jpeg package read in the following picture of your instructor into R

https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg

Use the parameter native=TRUE. What are the 30th and 80th quantiles of the resulting data? (some Linux systems may produce an answer 638 different for the 30th quantile)

  • -16776430 -15390165

  • -10904118 -10575416

  • -15259150 -10575416

  • 10904118 -594524

# download fig
library(jpeg)
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg"
download.file(fileUrl, destfile = "./data/jeff.jpg", mode = "wb")
# load fig into R
jeff <- readJPEG("./data/jeff.jpg", native = TRUE)
# result
quantile(jeff, probs = c(0.3, 0.8))

Question 3

Load the Gross Domestic Product data for the 190 ranked countries in this data set here.

Load the educational data from this data set here.

Match the data based on the country shortcode. How many of the IDs match? Sort the data frame in descending order by GDP rank (so United States is last). What is the 13th country in the resulting data frame?

Original data sources are here and here.

  • 234 matches, 13th country is Spain

  • 190 matches, 13th country is Spain

  • 190 matches, 13th country is St. Kitts and Nevis

  • 189 matches, 13th country is St. Kitts and Nevis

  • 189 matches, 13th country is Spain

  • 234 matches, 13th country is St. Kitts and Nevis

# download data
fileUrl1 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
fileUrl2 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
download.file(fileUrl1, destfile = "./data/GDP.csv")
download.file(fileUrl2, destfile = "./data/EDU.csv")
# load data into R
gdp <- read.csv("./data/GDP.csv", skip = 4, nrow = 190, stringsAsFactors = FALSE)[,c(1, 2, 4, 5)]
colnames(gdp) = c("CountryCode", "Ranking", "Economy", "GDP")
edu <- read.csv("./data/EDU.csv", stringsAsFactors = FALSE)
# merge data
mergeData <- merge(gdp, edu, by = "CountryCode")
# result 1
nrow(mergeData)
# result 2
library(dplyr)
arrangeData <- arrange(mergeData, desc(Ranking))
arrangeData[13, "Economy"]

Question 4

What is the average GDP ranking for the “High income: OECD” and “High income: nonOECD” group?

  • 30, 37

  • 23, 30

  • 32.96667, 91.91304

  • 23, 45

  • 23.966667, 30.91304

  • 133.72973, 32.96667

# group data
by_income <- group_by(mergeData, as.factor(Income.Group))
# result
summarise(by_income, meanRank = mean(Ranking))

Question 5

Cut the GDP ranking into 5 separate quantile groups. Make a table versus Income.Group. How many countries are Lower middle income but among the 38 nations with highest GDP?

  • 13

  • 12

  • 5

  • 0

# cut data into 5 groups
library(Hmisc)
mergeData$GDP <- cut2(mergeData$Ranking, g = 5)
table(mergeData$GDP, mergeData$Income.Group)
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值