说明
一篇好的研究论文需要具备可重复性,只有结果是没有意义的。你需要告诉别人,怎么按图索布才能得到你的分析结果。
- 其他研究者可以检验你的结果和过程是否严密科学
- 其他研究者可以在你的研究基础上,在某些环节进行扩展性研究
- 其他研究者可以了解你整个分析的脉络,更好的理解内容
以下内容是整理自coursera的Reproducible Research 课程的内容总结
一篇Reproducible文章包含的内容
- Tile/Author list
- Abstract
- Body/Results
- Supplementary Materials/the gory details
- Code/Data/really gory details
为了确保Reproducible,你需要做的
- Are we doing good science ##好的数据、团队、专注、兴趣
- Was any part of this analysis done do by hand ##不要手工对数据做加工
- if so ,are those parts preciselydocument
- does the documentation match reality
- Have we taught a computer to do as much as possible ##将处理数据的操作植入电脑
- Dont point and click ##不要使用GUIs图形用户交互界面
- Are we using a version control system ##使用类似github这样的版本控制来观察优化的过程
- Have we documented our software enviroment ##记录你的软件环境(R sessionInfo)
- Have we saved any output that we cannot reconstruct from original data+code ##不要只保存任何结果
- How far back in the analysis pipeline can we go before our results are longer (automatically) reproducible
##分析从raw data到report的整个过程是如何实现的
Reproducible不适合的地方
- Reproducible research is important,but does not necessarily solve the critical question of whether a data analysis is trustworthy
- Reproducible research focuses on the most “downstream” aspect of research dissemination
- Evidence-based data analysis would provide standardized,best practices for given scientific areas and questions
- Gives reviewers an important tool without dramatically increasing the burden on them
- More effort should be put into improving the quality of “upstream” aspects of scientific research
一篇好的Reproducible论文
http://www.rpubs.com/rdpeng/13396
监视是否达成Reproducible的标准细节
- Has either a (1) valid RPubs URL pointing to a data analysis document for this assignment been submitted; or (2) a complete PDF file presenting the data analysis been uploaded?
- Is the document written in English?
- Does the analysis include description and justification for any data transformations?
- Does the document have a title that briefly summarizes the data analysis?
- Does the document have a synopsis that describes and summarizes the data analysis in less than 10 sentences?
- Is there a section titled “Data Processing” that describes how the data were loaded into R and processed for analysis?
- Is there a section titled “Results” where the main results are presented?
- Is there at least one figure in the document that contains a plot?
- Are there at most 3 figures in this document?
- Does the analysis start from the raw data file (i.e. the original .csv.bz2 file)?
- Does the analysis address the question of which types of events are most harmful to population health?
- Does the analysis address the question of which types of events have the greatest economic consequences?
- Do all the results of the analysis (i.e. figures, tables, numerical summaries) appear to be reproducible?
- Do the figure(s) have descriptive captions (i.e. there is a description near the figure of what is happening in the figure)?
- As far as you can determine, does it appear that the work submitted for this project is the work of the student who submitted it?