Nextflow patterns

最新推荐文章于 2024-08-29 09:26:53 发布

GODamnbit

最新推荐文章于 2024-08-29 09:26:53 发布

阅读量414

点赞数

分类专栏： nextflow 文章标签：生物信息学

本文链接：https://blog.csdn.net/qq_40202164/article/details/120887557

版权

本文介绍了 Nextflow 的多种使用模式，包括基本模式如通道复制、分散执行、结果收集和输出组织，以及高级模式如条件资源定义和反馈循环。通过示例代码展示了如何在不同场景下有效地使用 Nextflow 实现生物信息学任务的流程化执行。

摘要由CSDN通过智能技术生成

Nextflow patterns

1 Basic Patterns

1.1 Channel duplication

P：需要在两个或多个进程中使用相同的通道作为输入

S：使用into运算符创建源通道的两个（或更多）副本。然后，使用新通道作为流程的输入。

代码：

Channel
    .fromPath('prots/*_?.fa')
    .into {
    prot1_ch; prot2_ch }

process foo {
   
  input: file x from prot1_ch
  script:
  """
    echo your_command --input $x
  """
}

process bar {
   
  input: file x from prot2_ch
  script:
  """
    your_command --input $x
  """
}

2 Scatter executions

2.1 Process per file path

P：需要为每个匹配 glob 模式的文件执行一个任务

S：使用Channel.fromPath方法创建一个通道，发出与 glob 模式匹配的所有文件。然后，使用通道作为执行任务的流程的输入。

代码：

Channel.fromPath('reads/*_1.fq.gz').set{
    samples_ch }

process foo {
   
  input:
  file x from samples_ch

  script:
  """
  your_command --input $x
  """
}

2.2 Process per file chunk

P：需要将一个或多个输入文件拆分为块并为每个文件执行一项任务

S：使用splitText运算符将文件拆分为给定大小的块。然后将结果通道用作执行任务的流程的输入

代码：

Channel
    .fromPath('poem.txt')
    .splitText(by: 5)
    .set{
    chunks_ch }

process foo {
   
  echo true
  input:
  file x from chunks_ch

  script:
  """
  rev $x | rev
  """
}

2.3 Process per file pairs

P：需要将文件处理到按对分组的目录中

S：使用Channel.fromFilePairs方法创建一个通道，该通道发出与 glob 模式匹配的文件对。该模式必须匹配成对文件名中的公共前缀。匹配文件作为元组发出，其中第一个元素是匹配文件的分组键，第二个元素是文件对本身。

代码：

Channel
    .fromFilePairs('reads/*_{1,2}.fq.gz')
    .set {
    samples_ch }

process foo {
   
  input:
  set sampleId, file(reads) from samples_ch

  script:
  """
  your_command --sample $sampleId --reads $reads
  """
}

自定义分组策略

需要时，可以定义自定义分组策略。一个常见的用例是对齐 BAM 文件 ( sample1.bam) 随附的索引文件。困难在于索引有时会被调用sample1.bai，有时sample1.bam.bai取决于所使用的软件。下面的例子可以适应这两种情况。

代码：

Channel
    .fromFilePairs('alignment/*.{bam,bai}') {
    file -> file.name.replaceAll(/.bam|.bai$/,'') }
    .set {
    samples_ch }

process foo {
   
  input:
  set sampleId, file(bam) from samples_ch

  script:
  """
  your_command --sample $sampleId --bam ${sampleId}.bam
  """
}

2.4 Process per file range

P：需要在具有共同索引范围的两个或更多系列文件上执行任务

S：使用from方法定义重复执行任务的范围，然后将其与map运算符链接以将每个索引与相应的输入文件相关联。最后使用结果通道作为过程的输入

代码：

Channel
  .from(1..23)
  .map {
    chr -> tuple("sample$chr", file("/some/path/foo.${chr}.indels.vcf"), file("/other/path/foo.snvs.${chr}.vcf")) }
  .set {
    pairs_ch }


process foo {
   
  tag "$sampleId"

  input:
  set sampleId, file(indels), file(snps) from pairs_ch

  """
  echo foo_command --this $indels --that $snps
  """
}

2.5 Process per CSV record

P：需要为一个或多个 CSV 文件中的每条记录执行一项任务

S：使用splitCsv运算符逐行读取 CSV 文件，然后使用map运算符返回每行所需字段的元组，并使用该file函数将任何字符串路径转换为文件路径对象。最后使用结果通道作为过程的输入

index.csv

sampleId	read 1	read2
FC816RLABXX	read/110101_I315_FC816RLABXX_L1_HUMrutRGXDIAAPE_1.fq.gz	read/110101_I315_FC816RLABXX_L1_HUMrutRGXDIAAPE_2.fq.gz
FC812MWABXX	read/110105_I186_FC812MWABXX_L8_HUMrutRGVDIABPE_1.fq.gz	read110105_I186_FC812MWABXX_L8_HUMrutRGVDIABPE_2.fq.gz
FC81DE8ABXX	read/110121_I288_FC81DE8ABXX_L3_HUMrutRGXDIAAPE_1.fq.gz	read/110121_I288_FC81DE8ABXX_L3_HUMrutRGXDIAAPE_2.fq.gz
FC81DB5ABXX	read/110122_I329_FC81DB5ABXX_L6_HUMrutRGVDIAAPE_1.fq.gz	read/110122_I329_FC81DB5ABXX_L6_HUMrutRGVDIAAPE_2.fq.gz
FC819P0ABXX	read/110128_I481_FC819P0ABXX_L5_HUMrutRGWDIAAPE_1.fq.gz	read/110128_I481_FC819P0ABXX_L5_HUMrutRGWDIAAPE_2.fq.gz

代码：

params.index = 'index.csv'

Channel
    .fromPath

最低0.47元/天解锁文章

GODamnbit

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录