For this first programming assignment you will write three functions that are meant to interact with dataset that accompanies this assignment. The dataset is contained in a zip file specdata.zip that you can download from the Coursera web site.
这篇文章里我分析并编写了三个作业要求的数据处理函数,完成相应功能:
作业1 完成pollutantmean()函数,统计指定数据均值
数据源:点击 这里 下载
The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file “200.csv”. Each file contains three variables:
这个压缩文件包含了332个独立的csv文件,反应了美国332个地区的硫、氮等污染物的数据信息,按照编号001-332.csv命名。每个文件包含以下三个字段:
Date: the date of the observation in YYYY-MM-DD format (year-month-day)
sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter)
nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter)
Date: 该数据监测到的时间
sulfate: 硫污染物指数
nitrate: 氮污染物指数
数据格式大概是这样,还算简单吧:
Write a function named ‘pollutantmean’ that calculates the mean of a pollutant (sulfate or nitrate) across a specified list of monitors. The function ‘pollutantmean’ takes three arguments: ‘directory’, ‘pollutant’, and ‘id’. Given a vector monitor ID numbers, ‘pollutantmean’ reads that monitors’ particulate matter data from the directory specified in the ‘directory’ argument and returns the mean of the pollutant across all of the monitors, ignoring any missing values coded as NA. A prototype of the function is as follows
编写一个叫做pollutantmean的函数,统计出指定文件,指定污染物的平均值。
该函数其中包含三个参数:
- directory: 数据源文件夹的地址
- pollutant: 污染物名字(sulfate 或者 nitrate)
- id: 一个向量集,包括所有需要输入的文件名
函数的第一行如下:
pollutantmean <- function(directory, pollutant = "sulfate", id =