Eurogenes Global 25(简称G25)是一种基于SmartPCA的Score值的祖源分析算法,有Scaled与Unscaled之分。与用百分比数值表示各成分祖源结果的普通祖源计算器的不同。具体信息可从国外网站查询获知,此文仅整理用R拟合G25坐标的学习笔记。
一、准备工作
1. 从R语言的官网直接下载、安装R,这里不再赘述;
2. 下载nMonte程序(下载链接如下,亦源自文末处的Eurogenes的博客链接),并放在需要运行的文件路径上;
3. 从Vahaduo复制已有的标杆值(或者有条件的用自己的G25祖源坐标值),到target.txt文件中,然后运行R;
本案例将复制HGDP北方汉族样本的Scaled坐标值:
Han_NChina,0.0259514,-0.4360684,0.0085228,-0.0622744,0.0483782,0.0186856,0.003008,9.24e-05,-0.0123532,0.0014212,-0.072393,-0.0089922,0.0107628,-0.006138,-0.0076544,-0.0014056,0.0019036,-0.0012162,-0.004676,-0.0091292,0.0100574,0.0093728,0.0101556,0.001229,0.0082624
https://vahaduo.github.io/https://vahaduo.github.io/https://www.exploreyourdna.com/
https://www.exploreyourdna.com/4. 准备参考数据文件“data.txt”和要分析的数据文件“target.txt”,如果需要修改要计算的数据,可按如下格式填充“target.txt”内容:
data.txt
二、打开安装好的R,输入指令
1. 修改R语言的文件运行路径
setwd('E:\\某文件路径\\G25_nMonte')
2. 调用蒙特卡洛(nMonte)拟合文件nMonte.R(拟合的最小数值单位:0.1(%))
source('nMonte.R')
3. 运行(除了txt格式,也支持csv格式)
getMonte('data.txt','target.txt')
观察一下效果:
部分计算结果的R语言界面
1. CLOSEST SINGLE ITEM DISTANCE%"
Han Jarawa Nganassan IRN_Shahr_I_Sokhta_BA3
5.02828 32.15232 34.89572 47.76597
BRA_LapaDoSanto_9600BP IRN_Ganj_Dareh_N Yamnaya_RUS_Samara Kura-Araxes_ARM_Kaps
53.75959 57.97647 58.46795 59.84457
[1] "2. FULL TABLE nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Han_NChina 0.025951400 -0.4360684000 0.008522800 -0.06227440 0.04837820 0.018685600 0.0030080000 0.000092400
fitted 0.024014654 -0.4365305155 0.002195135 -0.06011238 0.06103118 0.028301905 0.0020773149 -0.004633961
dif -0.001936745 -0.0004621155 -0.006327665 0.00216202 0.01265298 0.009616305 -0.0009306851 -0.004726361
PC9 PC10 PC11 PC12 PC13 PC14 PC15
Han_NChina -0.0123532000 0.001421200 -0.07239300 -0.008992200 0.010762800 -0.006138000 -0.007654400
fitted -0.0120070216 -0.001618685 -0.03906647 -0.005767134 0.006190188 -0.008755975 -0.005734937
dif 0.0003461784 -0.003039885 0.03332653 0.003225066 -0.004572612 -0.002617975 0.001919463
PC16 PC17 PC18 PC19 PC20 PC21 PC22
Han_NChina -0.0014056000 0.0019036000 -1.21620e-03 -0.0046760000 -0.0091292000 0.01005740 0.009372800
fitted 0.0002819158 0.0016970669 -9.83960e-06 -0.0001843051 -0.0093961991 0.01382878 0.005768916
dif 0.0016875158 -0.0002065331 1.20636e-03 0.0044916949 -0.0002669991 0.00377138 -0.003603884
PC23 PC24 PC25
Han_NChina 0.010155600 0.001229000 0.008262400
fitted 0.014797903 -0.001284875 -0.001287416
dif 0.004642303 -0.002513875 -0.009549816
[1] "distance%=4.0709"
Han_NChina
Han,91.5
Nganassan,6
Anatolia_Tepecik_Ciftlik_N,1.2
BRA_LapaDoSanto_9600BP,0.8
Kura-Araxes_ARM_Kaps,0.5
Anatolia_Barcin_N,0
Dinka,0
ETH_4500BP,0
Gambian,0
IRN_Ganj_Dareh_N,0
IRN_Shahr_I_Sokhta_BA3,0
Jarawa,0
Levant_PPNB,0
MAR_Iberomaurusian,0
WHG,0
Yamnaya_RUS_Samara,0
Yoruba,0
[1] "CORRELATION OF ADMIXTURE POPULATIONS"
Anatolia_Tepecik_Ciftlik_N BRA_LapaDoSanto_9600BP Han Kura-Araxes_ARM_Kaps Nganassan
Anatolia_Tepecik_Ciftlik_N 1.00 -0.40 -0.59 0.73 -0.54
BRA_LapaDoSanto_9600BP -0.40 1.00 0.43 -0.36 0.50
Han -0.59 0.43 1.00 -0.62 0.70
Kura-Araxes_ARM_Kaps 0.73 -0.36 -0.62 1.00 -0.53
Nganassan -0.54 0.50 0.70 -0.53 1.00
结果一共有四个部分,第一部分为和target相似的种族排序,第二部分用来对比原G25坐标与nMonte拟合后坐标值的差异,第三部分为nMonte拟合结果,第四部分用来比较混合种群的相关性。
其中不理解的英文单词可以查阅翻译程序或软件。其中不易查找的有:WHG为“欧洲西部狩猎采集者”,ETH_4500BP为“距今4500年的埃塞俄比亚人”。
提示:source所在行语句的“nMonte.R”也可以替换为“nMonte2.R”(拟合的最小数值单位:0.05(%))或“nMonte3.R”(拟合的最小数值单位:0.2(%))
调用“nMonte2.R”的计算结果
[1] "1. CLOSEST SINGLE ITEM DISTANCES"
Han Jarawa Nganassan IRN_Shahr_I_Sokhta_BA3
0.0502828 0.3215232 0.3489572 0.4776597
BRA_LapaDoSanto_9600BP IRN_Ganj_Dareh_N Yamnaya_RUS_Samara Kura-Araxes_ARM_Kaps
0.5375959 0.5797647 0.5846795 0.5984457
[1] "2. FULL TABLE nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Han_NChina 0.025951400 -0.4360684000 0.008522800 -0.062274400 0.04837820 0.01868560 0.0030080000 0.000092400
fitted 0.024054958 -0.4362386433 0.002163929 -0.060101603 0.06096721 0.02827858 0.0020799162 -0.004637385
dif -0.001896442 -0.0001702432 -0.006358871 0.002172797 0.01258901 0.00959298 -0.0009280838 -0.004729785
PC9 PC10 PC11 PC12 PC13 PC14 PC15
Han_NChina -0.012353200 0.001421200 -0.07239300 -0.008992200 0.010762800 -0.006138000 -0.007654400
fitted -0.012032694 -0.001625282 -0.03904013 -0.005758098 0.006176629 -0.008748997 -0.005730668
dif 0.000320506 -0.003046482 0.03335287 0.003234102 -0.004586171 -0.002610997 0.001923732
PC16 PC17 PC18 PC19 PC20 PC21 PC22
Han_NChina -0.0014056000 0.0019036000 -0.0012162000 -0.0046760000 -0.009129200 0.010057400 0.009372800
fitted 0.0002761843 0.0016961245 -0.0000113712 -0.0001841309 -0.009389463 0.013824407 0.005765313
dif 0.0016817843 -0.0002074755 0.0012048288 0.0044918691 -0.000260263 0.003767007 -0.003607487
PC23 PC24 PC25
Han_NChina 0.010155600 0.001229000 0.008262400
fitted 0.014789001 -0.001288778 -0.001287827
dif 0.004633401 -0.002517778 -0.009550227
[1] "distance%=4.0708 / distance=0.040708"
Han_NChina
Han 91.45
Nganassan 6.00
Anatolia_Tepecik_Ciftlik_N 1.20
BRA_LapaDoSanto_9600BP 0.80
Kura-Araxes_ARM_Kaps 0.55
Anatolia_Barcin_N 0.00
Dinka 0.00
ETH_4500BP 0.00
Gambian 0.00
IRN_Ganj_Dareh_N 0.00
IRN_Shahr_I_Sokhta_BA3 0.00
Jarawa 0.00
Levant_PPNB 0.00
MAR_Iberomaurusian 0.00
WHG 0.00
Yamnaya_RUS_Samara 0.00
Yoruba 0.00
[1] "3. RESTRICTED nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Han_NChina 0.025951400 -0.4360684000 0.008522800 -0.0622744000 0.04837820 0.018685600 0.003008000 0.000092400
fitted 0.024051083 -0.4363472079 0.002093122 -0.0614892795 0.06193914 0.027968204 0.004405437 -0.001726973
dif -0.001900317 -0.0002788079 -0.006429678 0.0007851205 0.01356094 0.009282604 0.001397437 -0.001819373
PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16
Han_NChina -0.012353200 0.0014212000 -0.07239300 -0.00899220 0.01076280 -0.006138000 -0.007654400 -0.0014056000
fitted -0.011308925 -0.0009725677 -0.03875748 -0.00572371 0.00622178 -0.009015966 -0.006031067 0.0001671377
dif 0.001044275 -0.0023937677 0.03363552 0.00326849 -0.00454102 -0.002877966 0.001623333 0.0015727377
PC17 PC18 PC19 PC20 PC21 PC22 PC23
Han_NChina 0.0019036000 -0.0012162000 -4.676000e-03 -0.0091292000 0.010057400 0.009372800 0.010155600
fitted 0.0017034104 0.0000743187 -6.038325e-05 -0.0094925643 0.014040278 0.005705196 0.014960724
dif -0.0002001896 0.0012905187 4.615617e-03 -0.0003633643 0.003982878 -0.003667604 0.004805124
PC24 PC25
Han_NChina 0.001229000 0.008262400
fitted -0.001231193 -0.001264582
dif -0.002460193 -0.009526982
[1] "distance%=4.0941 / distance=0.040941"
Han_NChina
Han 91.75
Nganassan 6.40
Anatolia_Tepecik_Ciftlik_N 1.85
[1] "CORRELATION OF ADMIXTURE POPULATIONS"
Anatolia_Tepecik_Ciftlik_N Han Nganassan
Anatolia_Tepecik_Ciftlik_N 1.00 -0.59 -0.54
Han -0.59 1.00 0.70
Nganassan -0.54 0.70 1.00
调用“nMonte3.R”的计算结果
[1] "1. CLOSEST SINGLE ITEM DISTANCE%"
Han Jarawa Nganassan IRN_Shahr_I_Sokhta_BA3
5.02828 32.15232 34.89572 47.76597
BRA_LapaDoSanto_9600BP IRN_Ganj_Dareh_N Yamnaya_RUS_Samara Kura-Araxes_ARM_Kaps
53.75959 57.97647 58.46795 59.84457
[1] "2. FULL TABLE nMONTE"
[1] "penalty= 0.001"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Han_NChina 0.025951400 -0.436068400 0.0085228000 -0.062274400 0.04837820 0.01868560 0.0030080000 0.00009240
fitted 0.021306033 -0.443514504 -0.0000001106 -0.060663057 0.06486444 0.03068566 0.0023251514 -0.00444825
dif -0.004645367 -0.007446104 -0.0085229106 0.001611343 0.01648624 0.01200006 -0.0006828486 -0.00454065
PC9 PC10 PC11 PC12 PC13 PC14 PC15
Han_NChina -0.0123532000 0.00142120 -0.07239300 -0.008992200 0.010762800 -0.006138000 -0.007654400
fitted -0.0122340366 -0.00231565 -0.04155548 -0.006135961 0.006555379 -0.008736972 -0.005267236
dif 0.0001191634 -0.00373685 0.03083752 0.002856239 -0.004207421 -0.002598972 0.002387164
PC16 PC17 PC18 PC19 PC20 PC21 PC22
Han_NChina -0.0014056000 0.0019036000 -0.0012162000 -0.0046760000 -0.0091292000 0.010057400 0.009372800
fitted 0.0005811736 0.0017638482 -0.0001879976 -0.0005459568 -0.0096387648 0.013521322 0.006020329
dif 0.0019867736 -0.0001397518 0.0010282024 0.0041300432 -0.0005095648 0.003463922 -0.003352471
PC23 PC24 PC25
Han_NChina 0.010155600 0.001229000 0.008262400
fitted 0.014826112 -0.001235334 -0.001448735
dif 0.004670512 -0.002464334 -0.009711135
[1] "distance%=4.1954"
Han_NChina
Han,93.8
Nganassan,4.8
BRA_LapaDoSanto_9600BP,0.6
Anatolia_Barcin_N,0.2
Dinka,0.2
IRN_Shahr_I_Sokhta_BA3,0.2
Levant_PPNB,0.2
如果电脑性能足够且您感兴趣,您也可以从Vahaduo的G25网页中选择计算器并复制source数据,来更换文件“data.txt”里面的数据。
data.txt
参考网址:
https://eurogenes.blogspot.com/2019/07/getting-most-out-of-global25_12.html
本文首发于知乎,也同步发表于CSDN,感谢您的阅读。
本文完。