用R语言拟合Eurogenes G25祖源坐标的学习笔记

Eurogenes Global 25(简称G25)是一种基于SmartPCA的Score值的祖源分析算法,有Scaled与Unscaled之分。与用百分比数值表示各成分祖源结果的普通祖源计算器的不同。具体信息可从国外网站查询获知,此文仅整理用R拟合G25坐标的学习笔记。


一、准备工作

1. 从R语言的官网直接下载、安装R,这里不再赘述;

2. 下载nMonte程序(下载链接如下,亦源自文末处的Eurogenes的博客链接),并放在需要运行的文件路径上;

EurogenesGlobal25nMonte的R语言源程序-其它文档类资源-CSDN下载

3. 从Vahaduo复制已有的标杆值(或者有条件的用自己的G25祖源坐标值),到target.txt文件中,然后运行R;

本案例将复制HGDP北方汉族样本的Scaled坐标值:
Han_NChina,0.0259514,-0.4360684,0.0085228,-0.0622744,0.0483782,0.0186856,0.003008,9.24e-05,-0.0123532,0.0014212,-0.072393,-0.0089922,0.0107628,-0.006138,-0.0076544,-0.0014056,0.0019036,-0.0012162,-0.004676,-0.0091292,0.0100574,0.0093728,0.0101556,0.001229,0.0082624

https://vahaduo.github.io/https://vahaduo.github.io/https://www.exploreyourdna.com/https://www.exploreyourdna.com/4. 准备参考数据文件“data.txt”和要分析的数据文件“target.txt”,如果需要修改要计算的数据,可按如下格式填充“target.txt”内容:

data.txt


二、打开安装好的R,输入指令

1. 修改R语言的文件运行路径

setwd('E:\\某文件路径\\G25_nMonte')

2. 调用蒙特卡洛(nMonte)拟合文件nMonte.R(拟合的最小数值单位:0.1(%))

source('nMonte.R')

3. 运行(除了txt格式,也支持csv格式)

getMonte('data.txt','target.txt')

 观察一下效果:

 部分计算结果的R语言界面

1. CLOSEST SINGLE ITEM DISTANCE%"
                   Han                 Jarawa              Nganassan IRN_Shahr_I_Sokhta_BA3 
               5.02828               32.15232               34.89572               47.76597 
BRA_LapaDoSanto_9600BP       IRN_Ganj_Dareh_N     Yamnaya_RUS_Samara   Kura-Araxes_ARM_Kaps 
              53.75959               57.97647               58.46795               59.84457 

[1] "2. FULL TABLE nMONTE"
[1] "Ncycles= 1000"
                    PC1           PC2          PC3         PC4        PC5         PC6           PC7          PC8
Han_NChina  0.025951400 -0.4360684000  0.008522800 -0.06227440 0.04837820 0.018685600  0.0030080000  0.000092400
fitted      0.024014654 -0.4365305155  0.002195135 -0.06011238 0.06103118 0.028301905  0.0020773149 -0.004633961
dif        -0.001936745 -0.0004621155 -0.006327665  0.00216202 0.01265298 0.009616305 -0.0009306851 -0.004726361
                     PC9         PC10        PC11         PC12         PC13         PC14         PC15
Han_NChina -0.0123532000  0.001421200 -0.07239300 -0.008992200  0.010762800 -0.006138000 -0.007654400
fitted     -0.0120070216 -0.001618685 -0.03906647 -0.005767134  0.006190188 -0.008755975 -0.005734937
dif         0.0003461784 -0.003039885  0.03332653  0.003225066 -0.004572612 -0.002617975  0.001919463
                    PC16          PC17         PC18          PC19          PC20       PC21         PC22
Han_NChina -0.0014056000  0.0019036000 -1.21620e-03 -0.0046760000 -0.0091292000 0.01005740  0.009372800
fitted      0.0002819158  0.0016970669 -9.83960e-06 -0.0001843051 -0.0093961991 0.01382878  0.005768916
dif         0.0016875158 -0.0002065331  1.20636e-03  0.0044916949 -0.0002669991 0.00377138 -0.003603884
                  PC23         PC24         PC25
Han_NChina 0.010155600  0.001229000  0.008262400
fitted     0.014797903 -0.001284875 -0.001287416
dif        0.004642303 -0.002513875 -0.009549816
[1] "distance%=4.0709"

         Han_NChina

Han,91.5
Nganassan,6
Anatolia_Tepecik_Ciftlik_N,1.2
BRA_LapaDoSanto_9600BP,0.8
Kura-Araxes_ARM_Kaps,0.5
Anatolia_Barcin_N,0
Dinka,0
ETH_4500BP,0
Gambian,0
IRN_Ganj_Dareh_N,0
IRN_Shahr_I_Sokhta_BA3,0
Jarawa,0
Levant_PPNB,0
MAR_Iberomaurusian,0
WHG,0
Yamnaya_RUS_Samara,0
Yoruba,0

[1] "CORRELATION OF ADMIXTURE POPULATIONS"
                           Anatolia_Tepecik_Ciftlik_N BRA_LapaDoSanto_9600BP   Han Kura-Araxes_ARM_Kaps Nganassan
Anatolia_Tepecik_Ciftlik_N                       1.00                  -0.40 -0.59                 0.73     -0.54
BRA_LapaDoSanto_9600BP                          -0.40                   1.00  0.43                -0.36      0.50
Han                                             -0.59                   0.43  1.00                -0.62      0.70
Kura-Araxes_ARM_Kaps                             0.73                  -0.36 -0.62                 1.00     -0.53
Nganassan                                       -0.54                   0.50  0.70                -0.53      1.00

结果一共有四个部分,第一部分为和target相似的种族排序,第二部分用来对比原G25坐标与nMonte拟合后坐标值的差异,第三部分为nMonte拟合结果,第四部分用来比较混合种群的相关性。

其中不理解的英文单词可以查阅翻译程序或软件。其中不易查找的有:WHG为“欧洲西部狩猎采集者”,ETH_4500BP为“距今4500年的埃塞俄比亚人”。

提示:source所在行语句的“nMonte.R”也可以替换为“nMonte2.R”(拟合的最小数值单位:0.05(%))或“nMonte3.R”(拟合的最小数值单位:0.2(%))

调用“nMonte2.R”的计算结果

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
                   Han                 Jarawa              Nganassan IRN_Shahr_I_Sokhta_BA3 
             0.0502828              0.3215232              0.3489572              0.4776597 
BRA_LapaDoSanto_9600BP       IRN_Ganj_Dareh_N     Yamnaya_RUS_Samara   Kura-Araxes_ARM_Kaps 
             0.5375959              0.5797647              0.5846795              0.5984457 


[1] "2. FULL TABLE nMONTE"
[1] "Ncycles= 1000"
                    PC1           PC2          PC3          PC4        PC5        PC6           PC7          PC8
Han_NChina  0.025951400 -0.4360684000  0.008522800 -0.062274400 0.04837820 0.01868560  0.0030080000  0.000092400
fitted      0.024054958 -0.4362386433  0.002163929 -0.060101603 0.06096721 0.02827858  0.0020799162 -0.004637385
dif        -0.001896442 -0.0001702432 -0.006358871  0.002172797 0.01258901 0.00959298 -0.0009280838 -0.004729785
                    PC9         PC10        PC11         PC12         PC13         PC14         PC15
Han_NChina -0.012353200  0.001421200 -0.07239300 -0.008992200  0.010762800 -0.006138000 -0.007654400
fitted     -0.012032694 -0.001625282 -0.03904013 -0.005758098  0.006176629 -0.008748997 -0.005730668
dif         0.000320506 -0.003046482  0.03335287  0.003234102 -0.004586171 -0.002610997  0.001923732
                    PC16          PC17          PC18          PC19         PC20        PC21         PC22
Han_NChina -0.0014056000  0.0019036000 -0.0012162000 -0.0046760000 -0.009129200 0.010057400  0.009372800
fitted      0.0002761843  0.0016961245 -0.0000113712 -0.0001841309 -0.009389463 0.013824407  0.005765313
dif         0.0016817843 -0.0002074755  0.0012048288  0.0044918691 -0.000260263 0.003767007 -0.003607487
                  PC23         PC24         PC25
Han_NChina 0.010155600  0.001229000  0.008262400
fitted     0.014789001 -0.001288778 -0.001287827
dif        0.004633401 -0.002517778 -0.009550227
[1] "distance%=4.0708 / distance=0.040708"

         Han_NChina
                                
Han                        91.45
Nganassan                   6.00
Anatolia_Tepecik_Ciftlik_N  1.20
BRA_LapaDoSanto_9600BP      0.80
Kura-Araxes_ARM_Kaps        0.55
Anatolia_Barcin_N           0.00
Dinka                       0.00
ETH_4500BP                  0.00
Gambian                     0.00
IRN_Ganj_Dareh_N            0.00
IRN_Shahr_I_Sokhta_BA3      0.00
Jarawa                      0.00
Levant_PPNB                 0.00
MAR_Iberomaurusian          0.00
WHG                         0.00
Yamnaya_RUS_Samara          0.00
Yoruba                      0.00



[1] "3. RESTRICTED nMONTE"
[1] "Ncycles= 1000"
                    PC1           PC2          PC3           PC4        PC5         PC6         PC7          PC8
Han_NChina  0.025951400 -0.4360684000  0.008522800 -0.0622744000 0.04837820 0.018685600 0.003008000  0.000092400
fitted      0.024051083 -0.4363472079  0.002093122 -0.0614892795 0.06193914 0.027968204 0.004405437 -0.001726973
dif        -0.001900317 -0.0002788079 -0.006429678  0.0007851205 0.01356094 0.009282604 0.001397437 -0.001819373
                    PC9          PC10        PC11        PC12        PC13         PC14         PC15          PC16
Han_NChina -0.012353200  0.0014212000 -0.07239300 -0.00899220  0.01076280 -0.006138000 -0.007654400 -0.0014056000
fitted     -0.011308925 -0.0009725677 -0.03875748 -0.00572371  0.00622178 -0.009015966 -0.006031067  0.0001671377
dif         0.001044275 -0.0023937677  0.03363552  0.00326849 -0.00454102 -0.002877966  0.001623333  0.0015727377
                    PC17          PC18          PC19          PC20        PC21         PC22        PC23
Han_NChina  0.0019036000 -0.0012162000 -4.676000e-03 -0.0091292000 0.010057400  0.009372800 0.010155600
fitted      0.0017034104  0.0000743187 -6.038325e-05 -0.0094925643 0.014040278  0.005705196 0.014960724
dif        -0.0002001896  0.0012905187  4.615617e-03 -0.0003633643 0.003982878 -0.003667604 0.004805124
                   PC24         PC25
Han_NChina  0.001229000  0.008262400
fitted     -0.001231193 -0.001264582
dif        -0.002460193 -0.009526982
[1] "distance%=4.0941 / distance=0.040941"

         Han_NChina
                                
Han                        91.75
Nganassan                   6.40
Anatolia_Tepecik_Ciftlik_N  1.85


[1] "CORRELATION OF ADMIXTURE POPULATIONS"
                           Anatolia_Tepecik_Ciftlik_N   Han Nganassan
Anatolia_Tepecik_Ciftlik_N                       1.00 -0.59     -0.54
Han                                             -0.59  1.00      0.70
Nganassan                                       -0.54  0.70      1.00

调用“nMonte3.R”的计算结果

[1] "1. CLOSEST SINGLE ITEM DISTANCE%"
                   Han                 Jarawa              Nganassan IRN_Shahr_I_Sokhta_BA3 
               5.02828               32.15232               34.89572               47.76597 
BRA_LapaDoSanto_9600BP       IRN_Ganj_Dareh_N     Yamnaya_RUS_Samara   Kura-Araxes_ARM_Kaps 
              53.75959               57.97647               58.46795               59.84457 

[1] "2. FULL TABLE nMONTE"
[1] "penalty= 0.001"
[1] "Ncycles= 1000"
                    PC1          PC2           PC3          PC4        PC5        PC6           PC7         PC8
Han_NChina  0.025951400 -0.436068400  0.0085228000 -0.062274400 0.04837820 0.01868560  0.0030080000  0.00009240
fitted      0.021306033 -0.443514504 -0.0000001106 -0.060663057 0.06486444 0.03068566  0.0023251514 -0.00444825
dif        -0.004645367 -0.007446104 -0.0085229106  0.001611343 0.01648624 0.01200006 -0.0006828486 -0.00454065
                     PC9        PC10        PC11         PC12         PC13         PC14         PC15
Han_NChina -0.0123532000  0.00142120 -0.07239300 -0.008992200  0.010762800 -0.006138000 -0.007654400
fitted     -0.0122340366 -0.00231565 -0.04155548 -0.006135961  0.006555379 -0.008736972 -0.005267236
dif         0.0001191634 -0.00373685  0.03083752  0.002856239 -0.004207421 -0.002598972  0.002387164
                    PC16          PC17          PC18          PC19          PC20        PC21         PC22
Han_NChina -0.0014056000  0.0019036000 -0.0012162000 -0.0046760000 -0.0091292000 0.010057400  0.009372800
fitted      0.0005811736  0.0017638482 -0.0001879976 -0.0005459568 -0.0096387648 0.013521322  0.006020329
dif         0.0019867736 -0.0001397518  0.0010282024  0.0041300432 -0.0005095648 0.003463922 -0.003352471
                  PC23         PC24         PC25
Han_NChina 0.010155600  0.001229000  0.008262400
fitted     0.014826112 -0.001235334 -0.001448735
dif        0.004670512 -0.002464334 -0.009711135
[1] "distance%=4.1954"

         Han_NChina

Han,93.8
Nganassan,4.8
BRA_LapaDoSanto_9600BP,0.6
Anatolia_Barcin_N,0.2
Dinka,0.2
IRN_Shahr_I_Sokhta_BA3,0.2
Levant_PPNB,0.2

如果电脑性能足够且您感兴趣,您也可以从Vahaduo的G25网页中选择计算器并复制source数据,来更换文件“data.txt”里面的数据。

data.txt


参考网址:

https://eurogenes.blogspot.com/2019/07/getting-most-out-of-global25_12.html

本文首发于知乎,也同步发表于CSDN,感谢您的阅读。
本文完。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值