R Programming Practice - Air Pollution

最新推荐文章于 2024-08-12 18:13:32 发布

skyCeleste.x

最新推荐文章于 2024-08-12 18:13:32 发布

阅读量353

点赞数

文章标签： r语言开发语言

本文链接：https://blog.csdn.net/jeonghin/article/details/124798515

版权

Part 1

Write a function named ‘pollutantmean’ that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function ‘pollutantmean’ takes three arguments: ‘directory’, ‘pollutant’, and ‘id’. Given a vector monitor ID numbers, ‘pollutantmean’ reads that monitors’ particulate matter data from the directory specified in the ‘directory’ argument and returns the mean of the pollutant across all of the monitors, ignoring any missing values coded as NA.

Breaking Down pollutantmean()

Outline:

pollutantmean(...) {

   # obtain list of sensor files in specdata directory

   # create empty data frame

   # subset list of sensor files

   # loop through files in subset list and
   #    * read the csv file
   #    * bind to "collector" data frame

   # calculate mean and return to parent environment
}

Solution:

pollutantmean <- function(directory, pollutant, id){
  fileslist <- list.files(directory, full.names = TRUE, pattern = ".csv")
  v <- data.frame()
  
  for (i in id){
    files <- read.table(fileslist[i],header = TRUE, sep = ",")
    j <- c(files)
    v <- rbind(v, j)
  }
  
  l <- v[[pollutant]]  # no quotos here, since when you call the function, you use "sulfate" or "nitrate"
  mean(l, na.rm=TRUE)
}

Part 2

Write a function that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases.

Solution:

complete <- function(directory, id){
  fileslist <- list.files(directory, full.names = TRUE, pattern = ".csv")
  
  nobs <- c()
  k <- 1
  for (f in id){
    files <- read.table(fileslist[f],header = TRUE, sep = ",")
    row <- nrow(na.omit(files))
    nobs[k] <- row
    k <- k + 1
  }
  df <- cbind(vector, nobs)
  df <- data.frame(df)
}

Part 3

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0.

corr <- function(directory, threshold){
  fileslist <- list.files(directory, full.names = TRUE, pattern = ".csv")
  v <- c()
  k <- 1
  for (f in fileslist){
    files <- read.table(f,header = TRUE, sep = ",")
    rm <- na.omit(files)
    row <- nrow(rm)
    if (row > threshold){
      x <- rm[, "sulfate"]
      y <- rm[, 'nitrate']
      corr <- cor(x, y)
      v[k] <- corr  
      k <- k + 1
    }   else{vector <- c()
    
    }
  
  }
  v
}

skyCeleste.x

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
R Programming Practice - Air Pollution

Part 1Write a function named ‘pollutantmean’ that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function ‘pollutantmean’ takes three arguments: ‘directory’, ‘pollutant’, and ‘id’. Given a vector monitor
复制链接

扫一扫