1. 教程-Quick Introduction to GNU Parallel
https://github.com/LangilleLab/microbiome_helper/wiki/Quick-Introduction-to-GNU-Parallel
GNU并行是一个处理重复命令的有用工具。
这在生物信息学中尤其有用,在生物信息学中,我们常常希望在大量文件上运行完全相同的命令。
GNU并行有大量的文档,可以为用户提供复杂的控制。
下面我将演示一个用户如何运行基本命令同时运行多个作业。
对于每一行输入,GNU Parallel会把这一行做为参数来运行指定的命令。如果没有给出命令,那么这一行会被当做命令执行。多行输入会并行的运行。GNU Parallel经常被用于替代xargs或者cat | bash。
2. 资源
文章:
GNU Parallel: The Command-Line Power Tool
https://www.usenix.org/system/files/login/articles/105438-Tange.pdf
biostart介绍
https://www.biostars.org/p/63816/
英文教程:
https://www.gnu.org/software/parallel/parallel_tutorial.html
翻译的中文教程:——推荐查看
https://my.oschina.net/enyo/blog/271612
3. 准备
1. 下载虚拟机
进入主页
https://github.com/LangilleLab/microbiome_helper/wiki/Microbiome-Helper-Virtual-Box
下载虚拟机 Microbiome Helper Vbox Amplicon-only (v0.3)
http://kronos.pharmacology.dal.ca/public_files/Microbiome_Helper_Vbox/MicrobiomeHelper_amplicon_v0.3.ova
下载后导入虚拟机,安装增强工具。
默认账号和密码都为,The default username and password are both “mh_user”.
2. 下载数据
wget https://www.dropbox.com/s/v7twasg0fro45x0/parallel_example.tar.gz?dl=1 -O parallel_example.tar.gz
4. 命令解析
parallel --eta -j 2 --load 80% --noswap 'blastp -db pdb_blast_db_example/pdb_seqres.txt -query {} -out blastp_outfiles/{.}.out -evalue 0.0001 -word_size 7 -outfmt "6 std stitle staxids sscinames" -max_target_seqs 10 -num_threads 1' ::: test_seq*.fas
参数:
"--eta": Shows the estimated time remaining to run all jobs.
"-j 2" (or "--jobs 2"): The number of commands to run at the same time, which in this case was set to 2.
"--load 80%": The maximum CPU load at which new jobs will not be started. So in this case we are specifying that jobs can be run simultaneously up to 80% of the CPUs being run. This is more important when you are dealing with larger numbers of long-running commands.
"--noswap": New jobs won't be started if there is both swap-in and swap-out activity. In other words, new jobs won't be started if the server is under heavy memory load such that information needs to be removed from memory before new information can be stored.
暂存还不数量,不会控制并行命令行数,-j的方法控制不了。有待补充。