java queue GATK_GATK hapltoypecaller queue

Many users have reported issues running HaplotypeCaller with the -nct argument, so we recommend using Queue to parallelize HaplotypeCaller instead of multithreading.

https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php

To make it run faster you can run it on a machine with a large number of cores or on a Sun Grid Engine cluster. You can use the GATK queue library together with a small scala script to start the haplotype caller on multiple cores, locally or on a cluster.

I am using Queue. I think it is still useful even if you only have one multi-core workstation. It can speed up IndelRealignment and HaplotypeCaller significantly.

The documentation obviously is inadequate but it is still possible to figure it out by reading some scripts you can download from the internet. However, I can see many scientists who are not from a progamming background might find learning a new language called Scala not worth the effort.

http://gatkforums.broadinstitute.org/gatk/discussion/5353/how-actively-used-is-queue

http://gatkforums.broadinstitute.org/gatk/discussion/5334/how-to-initiate-scatter-gather-on-one-machine

The HaplotypeCaller documentation recommends using Queue to parallelize HaplotypeCaller instead of -nct, so I've been attempting to do that, however I can't seem to get Queue to do any kind of parallel processing. I'm currently working on a machine with 8 cores and I'm consistently getting Queue to run, but it only runs single-threaded. I don't have access to a distributed computing environment, but I don't see why Queue wouldn't be able to parallelize on one machine with multiple cores, and I see no documentation indicating that threading by Queue is only available in distributed computing environments.

What I've done is a minimal modification of the ExampleUnifiedGenotyper.scala script to use it to run HaplotypeCaller. I have tried running it a couple of times to see how it would run. I tried a couple times with just the reference file and mapping file as input, plus I tried a couple times with an intervals file listing each of the chromosomes as separate intervals. Every time, it ran single threaded.

I've found several articles and comments indicating that Queue should be used to Scatter/Gather a job and even explain how Scatter/Gather works, so I was under the assumption that this is just what Queue does and it would use multi-core systems to their full advantage, however this is not my experience and I don't see anything in the documentation to explain why. If it could be explained to me either how I'm running the command wrong, or why Queue can't be used to parallelize on one machine, I would be very grateful.

Ah, sorry for the confusion. Queue is intended more for compute farms and requires a job scheduler (eg LSF or GridEngine) to actually dispatch jobs to different nodes.

Do you know if the issues with running HaplotypeCaller with -nct have been addressed in version 3.3-0?

No they have not been addressed, and probably will not be. We are moving away from multithreading in favor of scatter gather. It is a much more stable method of parallelism and less prone to race conditions.

So I am trying to run HaplotypeCaller using Queue on a single 32 core, 256G RAM machine. I have been using a modified version ofthis script. I am able to run the Walker (with -sg 50) and it seems to be running the Scatter portion, however, I only see one java process running. Shouldn't I see 50? Is there some other switch I need to use to make the Walker run 50 times in parallel?

Queue's scatter gather is designed to work on a cluster, not on a single machine. In a single machine it will run all jobs sequentially.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
cpu_sys_in_millis cpu_user_in_millis merge_threads merge_queue merge_active merge_rejected merge_largest merge_completed bulk_threads bulk_queue bulk_active bulk_rejected bulk_largest bulk_completed warmer_threads warmer_queue warmer_active warmer_rejected warmer_largest warmer_completed get_largest get_completed get_threads get_queue get_active get_rejected index_threads index_queue index_active index_rejected index_largest index_completed suggest_threads suggest_queue suggest_active suggest_rejected suggest_largest suggest_completed fetch_shard_store_queue fetch_shard_store_active fetch_shard_store_rejected fetch_shard_store_largest fetch_shard_store_completed fetch_shard_store_threads management_threads management_queue management_active management_rejected management_largest management_completed percolate_queue percolate_active percolate_rejected percolate_largest percolate_completed percolate_threads listener_active listener_rejected listener_largest listener_completed listener_threads listener_queue search_rejected search_largest search_completed search_threads search_queue search_active fetch_shard_started_threads fetch_shard_started_queue fetch_shard_started_active fetch_shard_started_rejected fetch_shard_started_largest fetch_shard_started_completed refresh_rejected refresh_largest refresh_completed refresh_threads refresh_queue refresh_active optimize_threads optimize_queue optimize_active optimize_rejected optimize_largest optimize_completed snapshot_largest snapshot_completed snapshot_threads snapshot_queue snapshot_active snapshot_rejected generic_threads generic_queue generic_active generic_rejected generic_largest generic_completed flush_threads flush_queue flush_active flush_rejected flush_largest flush_completed server_open rx_count rx_size_in_bytes tx_count tx_size_in_bytes
06-02
这些指标是Elasticsearch集群监控指标,包括: - cpu_sys_in_millis:集群中所有节点的系统CPU使用时间,即内核态时间。 - cpu_user_in_millis:集群中所有节点的用户CPU使用时间,即用户态时间。 - merge_threads/merge_queue/merge_active/merge_rejected/merge_largest/merge_completed:用于合并段(segments)的线程池监控指标。 - bulk_threads/bulk_queue/bulk_active/bulk_rejected/bulk_largest/bulk_completed:用于批量操作的线程池监控指标。 - warmer_threads/warmer_queue/warmer_active/warmer_rejected/warmer_largest/warmer_completed:用于预热索引的线程池监控指标。 - get_largest/get_completed/get_threads/get_queue/get_active/get_rejected:用于处理get请求的线程池监控指标。 - index_threads/index_queue/index_active/index_rejected/index_largest/index_completed:用于处理index请求的线程池监控指标。 - suggest_threads/suggest_queue/suggest_active/suggest_rejected/suggest_largest/suggest_completed:用于处理suggest请求的线程池监控指标。 - fetch_shard_store_queue/fetch_shard_store_active/fetch_shard_store_rejected/fetch_shard_store_largest/fetch_shard_store_completed/fetch_shard_store_threads:用于获取分片数据的线程池监控指标。 - management_threads/management_queue/management_active/management_rejected/management_largest/management_completed:用于管理操作的线程池监控指标。 - percolate_queue/percolate_active/percolate_rejected/percolate_largest/percolate_completed/percolate_threads:用于处理percolate请求的线程池监控指标。 - listener_active/listener_rejected/listener_largest/listener_completed/listener_threads/listener_queue:用于处理请求的监听器监控指标。 - search_rejected/search_largest/search_completed/search_threads/search_queue/search_active:用于处理search请求的线程池监控指标。 - fetch_shard_started_threads/fetch_shard_started_queue/fetch_shard_started_active/fetch_shard_started_rejected/fetch_shard_started_largest/fetch_shard_started_completed:用于获取分片数据的线程池监控指标。 - refresh_rejected/refresh_largest/refresh_completed/refresh_threads/refresh_queue/refresh_active:用于刷新操作的线程池监控指标。 - optimize_threads/optimize_queue/optimize_active/optimize_rejected/optimize_largest/optimize_completed:用于优化操作的线程池监控指标。 - snapshot_largest/snapshot_completed/snapshot_threads/snapshot_queue/snapshot_active/snapshot_rejected:用于快照操作的线程池监控指标。 - generic_threads/generic_queue/generic_active/generic_rejected/generic_largest/generic_completed:用于处理通用请求的线程池监控指标。 - flush_threads/flush_queue/flush_active/flush_rejected/flush_largest/flush_completed:用于刷新操作的线程池监控指标。 - server_open:当前打开的HTTP连接数。 - rx_count/rx_size_in_bytes:接收的HTTP请求数和数据量。 - tx_count/tx_size_in_bytes:发送的HTTP响应数和数据量。 这些指标可以帮助我们监控Elasticsearch集群的运行状态和性能,及时发现并解决潜在的问题。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值