pythonnet plink,将文本文件转换为Plink PED和MAP格式

I have the following data (small part of it) named "short2_pre_snp_tumor.txt"

rs987435 C G 1 1 1 0 2

rs345783 C G 0 0 1 0 0

rs955894 G T 1 1 2 2 1

rs6088791 A G 1 2 0 0 1

rs11180435 C T 1 0 1 1 1

rs17571465 A T 1 2 2 2 2

rs17011450 C T 2 2 2 2 2

rs6919430 A C 2 1 2 2 2

rs2342723 C T 0 2 0 0 0

rs11992567 C T 2 2 2 2 2

and I need to get the PED and MAP file using Python, as R is really slow in case of large dataset.

I have the following code in R:

tm

d

n

nrs

dd

for (j in 1:nrs) {

for (i in 1:n) {

if (d[i, j+3]==0) {

dd[j, 2*i-1]

dd[j, 2*i]

} else if (d[i, j+3]==1) {

dd[j, 2*i-1]

dd[j, 2*i]

} else if (d[i, j+3]==2) {

dd[j, 2*i-1]

dd[j, 2*i]

}

}

}

ped6front

BRCA_tumorfromR.ped

write.table(BRCA_tumorfromR.ped, “BRCA_tumor.ped”, append=FALSE, quote=FALSE, col.names=FALSE)

proc.time() #ptm

解决方案

Here is using R:

# raw data

myRaw

rs987435 C G 1 1 1 0 2

rs345783 C G 0 0 1 0 0

rs955894 G T 1 1 2 2 1

rs6088791 A G 1 2 0 0 1

rs11180435 C T 1 0 1 1 1

rs17571465 A T 1 2 2 2 2

rs17011450 C T 2 2 2 2 2

rs6919430 A C 2 1 2 2 2

rs2342723 C T 0 2 0 0 0

rs11992567 C T 2 2 2 2 2")

nIndividuals

nSNPs

# make map, easy

MAP

CHR = 1,

SNP = myRaw$V1,

CM = 0,

BP = seq(nSNPs))

# get first 6 columns of PED, easy

PED6

FID = seq(nIndividuals),

IID = seq(nIndividuals),

FatherID = 0,

MotherID = 0,

Sex = 1,

Phenotype = 1)

# convert 0,1,2 to genotypes, a bit tricky

# make helper dataframe for matching alleles

myAlleles

AA = paste(myRaw$V2, myRaw$V2),

AB = paste(myRaw$V2, myRaw$V3),

BB = paste(myRaw$V3, myRaw$V3))

# make index to match with alleles

PEDsnps

# convert

PEDsnpsAB

sapply(seq(nSNPs), function(snp)

sapply(PEDsnps[snp, ], function(ind) myAlleles[snp, ind]))

# column bind first 6 cols with genotypes

PED

#output PED and MAP

write.table(PED, "gwas.ped", quote = FALSE, col.names = FALSE, row.names = FALSE, sep = "\t")

write.table(MAP, "gwas.map", quote = FALSE, col.names = FALSE, row.names = FALSE, sep = "\t")

# test plink

# plink --file gwas

# PLINK v1.90b3c 64-bit (2 Feb 2015) https://www.cog-genomics.org/plink2

# (C) 2005-2015 Shaun Purcell, Christopher Chang GNU General Public License v3

# Logging to plink.log.

# 258273 MB RAM detected; reserving 129136 MB for main workspace.

# .ped scan complete (for binary autoconversion).

# Performing single-pass .bed write (10 variants, 5 people).

# --file: plink.bed + plink.bim + plink.fam written.

  • 0
    点赞
  • 0
    评论
  • 0
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

表情包
插入表情
评论将由博主筛选后显示,对所有人可见 | 还能输入1000个字符
©️2021 CSDN 皮肤主题: 深蓝海洋 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值