R Programming Practice - Air Pollution

Part 1

Write a function named ‘pollutantmean’ that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function ‘pollutantmean’ takes three arguments: ‘directory’, ‘pollutant’, and ‘id’. Given a vector monitor ID numbers, ‘pollutantmean’ reads that monitors’ particulate matter data from the directory specified in the ‘directory’ argument and returns the mean of the pollutant across all of the monitors, ignoring any missing values coded as NA.

Breaking Down pollutantmean()

Outline:

pollutantmean(...) {

   # obtain list of sensor files in specdata directory

   # create empty data frame

   # subset list of sensor files

   # loop through files in subset list and
   #    * read the csv file
   #    * bind to "collector" data frame

   # calculate mean and return to parent environment
}

Solution:

pollutantmean <- function(directory, pollutant, id){
  fileslist <- list.files(directory, full.names = TRUE, pattern = ".csv")
  v <- data.frame()
  
  for (i in id){
    files <- read.table(fileslist[i],header = TRUE, sep = ",")
    j <- c(files)
    v <- rbind(v, j)
  }
  
  l <- v[[pollutant]]  # no quotos here, since when you call the function, you use "sulfate" or "nitrate"
  mean(l, na.rm=TRUE)
}

Part 2

Write a function that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases.

Solution:

complete <- function(directory, id){
  fileslist <- list.files(directory, full.names = TRUE, pattern = ".csv")
  
  nobs <- c()
  k <- 1
  for (f in id){
    files <- read.table(fileslist[f],header = TRUE, sep = ",")
    row <- nrow(na.omit(files))
    nobs[k] <- row
    k <- k + 1
  }
  df <- cbind(vector, nobs)
  df <- data.frame(df)
}

Part 3

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0.

corr <- function(directory, threshold){
  fileslist <- list.files(directory, full.names = TRUE, pattern = ".csv")
  v <- c()
  k <- 1
  for (f in fileslist){
    files <- read.table(f,header = TRUE, sep = ",")
    rm <- na.omit(files)
    row <- nrow(rm)
    if (row > threshold){
      x <- rm[, "sulfate"]
      y <- rm[, 'nitrate']
      corr <- cor(x, y)
      v[k] <- corr  
      k <- k + 1
    }   else{vector <- c()
    
    }
  
  }
  v
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值