Coursera概率图模型（Probabilistic Graphical Models）第二周编程作业分析

最新推荐文章于 2023-05-22 15:44:02 发布

小石学CS

最新推荐文章于 2023-05-22 15:44:02 发布

阅读量951

点赞数

分类专栏：概率图模型文章标签：概率图模型作业

本文链接：https://blog.csdn.net/shi2xian2wei2/article/details/81841504

版权

概率图模型专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Bayes Nets for Genetic Inheritance

基因遗传的贝叶斯网络

1.构建基因遗传的贝叶斯网络

本章要求构建如下图所示的贝叶斯网络：

图中，变量1、2、3分别表示父母及子女的基因型（Genotype），变量4、5、6分别表示父母及子女基因型所对应的性状（Phenotype）。同时，基因型本身由等位基因（Allele）决定。图中的三个虚线框标记了组成整个贝叶斯网络的三个基本模板因子（Template），也就是本节编程作业的2-4题。最后我们可以通过组合这些模板因子构建出完整的贝叶斯网络。

phenotypeGivenGenotypeMendelianFactor.m 基因型决定性状的孟德尔因子

输入:

isDominant = 1;

genotypeVar = 1;

phenotypeVar = 3;

期望输出：

phenotypeFactor = struct('var', [3,1], 'card', [2,3], 'val', [1,0,1,0,0,1]);

因子phenotypeFactor的结构及意义如下：

var：表示变量（variables）的名称及它们之间的关系，这里（'var', [3,1]）表示因子描述的是phenotypeVar = 3的性状对genotypeVar = 1的基因型的条件概率分布。

card：是基数（cardinalities）的缩写，描述了各变量的状态数量，这里表示性状可能的状态数为2（有或没有此性状），基因型可能的状态数为3（AA，Aa，aa三种）。

val：表示因子中各变量可能组合对应的概率值（values）。由于这个问题是基于孟德尔遗传理论，同时isDominant = 1，表示该等位基因为显性基因，因此当基因型为AA、Aa的时候，表现出对应性状，而基因型为aa时不表现出对应性状。

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, genotypeVar];

% phenotypeFactor.card(1) : number of possible phenotypes

% phenotypeFactor.card(2) : number of genotypes

phenotypeFactor.card = [2, 3];



assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



% trait = 1, no trait = 2 | AA = 1, Aa = 2, aa = 3

Index = [find(assignments(:, 1) == 1 & assignments(:, 2) == 1);

         find(assignments(:, 1) == 1 & assignments(:, 2) == 2);

         find(assignments(:, 1) == 2 & assignments(:, 2) == 3)]';



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));

phenotypeFactor.val(Index) = 1;

if isDominant == 0

    phenotypeFactor.val = ~phenotypeFactor.val;

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

phenotypeGivenGenotypeFactor.m 基因型决定性状的因子

很多基因型与性状之间的关系并不严格遵守孟德尔遗传定律，而是仅仅影响体现某性状的可能性。这一节就是在孟德尔遗传定律的基础上，对这种更为合理的情况进行建模。

输入:

alphaList = [0.8; 0.6; 0.1];

genotypeVar = 1;

phenotypeVar = 3;

期望输出：

phenotypeFactorAlpha = struct('var', [3,1], 'card', [2,3], 'val', [0.8,0.2,0.6,0.4,0.1,0.9]);

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

% The number of genotypes is the length of alphaList.

% The number of phenotypes is 2.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, genotypeVar];

% phenotypeFactor.card(1) : number of possible phenotypes

% phenotypeFactor.card(2) : number of genotypes

phenotypeFactor.card = [2, length(alphaList)];



assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));

for ii = 1 : length(alphaList)

    phenotypeFactor.val(find(assignments(:, 1) == 1 & assignments(:, 2) == ii)) = alphaList(ii);

    phenotypeFactor.val(find(assignments(:, 1) == 2 & assignments(:, 2) == ii)) = 1 - alphaList(ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

genotypeGivenAlleleFreqsFactor.m 等位基因决定基因型的因子

本节要求对给定等位基因概率的情况下，计算基因型的概率分布。注意下基因型Aa与aA相同就好。

输入:

alleleFreqs = [0.1; 0.9];

genotypeVar = 1;

期望输出：

genotypeFactor = struct('var', [1], 'card', [3], 'val', [0.01,0.18,0.81]);

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

genotypeFactor.var = genotypeVar;

genotypeFactor.card = size(genotypesToAlleles, 1);



genotypeFactor.val = zeros(1, prod(genotypeFactor.card));



for ii = 1 : genotypeFactor.card

    if genotypesToAlleles(ii, 1) == genotypesToAlleles(ii, 2)

        genotypeFactor.val(ii) = alleleFreqs(genotypesToAlleles(ii, 1)) ^ 2;

    else

        genotypeFactor.val(ii) = 2 * alleleFreqs(genotypesToAlleles(ii, 1)) * ...

                                  alleleFreqs(genotypesToAlleles(ii, 2));

    end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

genotypeGivenParentsGenotypesFactor.m 父母基因型决定子女基因型的因子

这里var中存在3个参数，表示[子女的基因型编号，父母1的基因型编号，父母2的基因型编号]。

输入:

numAlleles = 2;

genotypeVarChild = 3;

genotypeVarParentOne = 1;

genotypeVarParentTwo = 2;

期望输出：

genotypeFactorPar = struct('var', [3,1,2], 'card', [3,3,3], 'val', [1,0,0,0.5,0.5,0,0,1,0,0.5,0.5,0,0.25,0.5,0.25,0,0.5,0.5,0,1,0,0,0.5,0.5,0,0,1]);

当然，numAlleles也可能等于3、4、5之类的，合理利用题目中提供的generateAlleleGenotypeMappers函数即可。

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

genotypeFactor.var = genotypeVar;

genotypeFactor.card = size(genotypesToAlleles, 1);



genotypeFactor.val = zeros(1, prod(genotypeFactor.card));



for ii = 1 : genotypeFactor.card

    if genotypesToAlleles(ii, 1) == genotypesToAlleles(ii, 2)

        genotypeFactor.val(ii) = alleleFreqs(genotypesToAlleles(ii, 1)) ^ 2;

    else

        genotypeFactor.val(ii) = 2 * alleleFreqs(genotypesToAlleles(ii, 1)) * ...

                                  alleleFreqs(genotypesToAlleles(ii, 2));

    end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

constructGeneticNetwork.m 构建基因贝叶斯网络

对比之前的贝叶斯网络结构图，我们可以看到构成网络的三个模板因子已经完成了，接下来的工作就是把它们拼起来……

输入:

pedigree = struct('parents', [0,0;1,3;0,0]);

pedigree.names = {'Ira','James','Robin'};

alleleFreqs = [0.1; 0.9];

alphaList = [0.8; 0.6; 0.1];

期望输出：

sampleFactorList = load('sampleFactorList.mat'); % 这个输出太多了，跑下程序就好，我们主要看输入结构哈~

这里，pedigree描述了参数直接的结构关系，就是谁是谁的爸爸谁是谁的儿子，他们这几个人叫什么……其它的参考上述练习即可。

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

% Variable numbers:

% 1 - numPeople: genotype variables

% numPeople+1 - 2*numPeople: phenotype variables

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

for ii = 1 : numPeople

    if sum(pedigree.parents(ii, :)) == 0

        factorList(ii) = genotypeGivenAlleleFreqsFactor(alleleFreqs, ii);

    else

        factorList(ii) = genotypeGivenParentsGenotypesFactor(numAlleles, ii, pedigree.parents(ii, 1), pedigree.parents(ii, 2));

    end

end

for ii = numPeople + 1 : 2 * numPeople

    factorList(ii) = phenotypeGivenGenotypeFactor(alphaList, ii - numPeople, ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

2.构建解耦贝叶斯网络

前面我们构建的贝叶斯网络，并没有考虑到染色体对的影响。这里我们在考虑基因成对出现的情况下，构建新的贝叶斯网络如下图所示：

可以看到，在加入基因对的概念后，整个网络各变量之间存在比较复杂的相互影响关系，也就是耦合关系。这大概也就是题目称之为“解耦网络”的原因——我们要通过新的模板因子将网络进行简化，以实现对这种复杂关系的建模。

phenotypeGivenCopiesFactor.m 成对基因决定性状的因子

输入:

alphaListThree = [0.8; 0.6; 0.1; 0.5; 0.05; 0.01];

numAllelesThree = 3;

genotypeVarMotherCopy = 1;

genotypeVarFatherCopy = 2;

phenotypeVar = 3;

期望输出：

phenotypeFactorPar = struct('var', [3,1,2], 'card', [2,3,3], 'val', [0.8,0.2,0.6,0.4,0.1,0.9,0.6,0.4,0.5,0.5,0.05,0.95,0.1,0.9,0.05,0.95,0.01,0.99]);

注意alphaList的长度等于numAlleles的组合数，也就是genotypesToAlleles的列数。

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%INSERT YOUR CODE HERE

% The number of phenotypes is 2

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, geneCopyVarOne, geneCopyVarTwo];

phenotypeFactor.card = [2, numAlleles, numAlleles];



assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));



allelesCombination = sort(assignments(:, [2, 3]), 2);

for ii = 1 : size(genotypesToAlleles, 1)

    phenotypeFactor.val(find(sum(allelesCombination == genotypesToAlleles(ii, :), 2) == 2 & assignments(:, 1) == 1)) = alphaList(ii);

    phenotypeFactor.val(find(sum(allelesCombination == genotypesToAlleles(ii, :), 2) == 2 & assignments(:, 1) == 2)) = 1 - alphaList(ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

constructDecoupledGeneticNetwork.m 构建解耦基因网络

之后，我们只要用成对基因决定性状的因子替代基因型决定性状的因子，之后调整下之前网络的输入输出结构就好~

输入:

pedigree = struct('parents', [0,0;1,3;0,0]);

pedigree.names = {'Ira','James','Robin'};

alleleFreqsThree = [0.1; 0.7; 0.2];

alleleListThree = {'F', 'f', 'n'};

alphaListThree = [0.8; 0.6; 0.1; 0.5; 0.05; 0.01];

期望输出：

sampleFactorListDecoupled = load('sampleFactorListDecoupled.mat'); % 这个输出也太多了……

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% INSERT YOUR CODE HERE

% Variable numbers:

% 1 - numPeople: first parent copy of gene variables

% numPeople+1 - 2*numPeople: second parent copy of gene variables

% 2*numPeople+1 - 3*numPeople: phenotype variables

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

for ii = 1 : numPeople

    if sum(pedigree.parents(ii, :)) == 0

        factorList(ii) = childCopyGivenFreqsFactor(alleleFreqs, ii);

    else

        factorList(ii) = childCopyGivenParentalsFactor(numAlleles, ii, pedigree.parents(ii, 1), pedigree.parents(ii, 1) + numPeople);

    end

end

for ii = numPeople + 1 : 2 * numPeople

    if sum(pedigree.parents(ii - numPeople, :)) == 0

        factorList(ii) = childCopyGivenFreqsFactor(alleleFreqs, ii);

    else

        factorList(ii) = childCopyGivenParentalsFactor(numAlleles, ii, pedigree.parents(ii - numPeople, 2), pedigree.parents(ii - numPeople, 2) + numPeople);

    end

end

for ii = 2 * numPeople + 1 : 3 * numPeople

    factorList(ii) = phenotypeGivenCopiesFactor(alphaList, numAlleles, ii - 2 * numPeople, ii - numPeople, ii);

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

可喜可贺，到这里我们就只剩下一个sigmoid贝叶斯网络啦~

3.构建性状由多基因决定的贝叶斯网络

说到sigmoid，我想听了机器学习相关课程的同学应该都很熟，这里就不累述了，总之原理和逻辑回归（或者是单sigmoid核的感知机）一模一样，理解起来也没啥难度的说。

上边第二个图出自林轩田老师的机器学习基石课程ppt，模型是一模一样的。

constructSigmoidPhenotypeFactor.m 构建sigmoid性状因子

输入：

alleleWeights = {[3, -3], [0.9, -0.8]};

geneCopyVarParentOneList = [1; 2];

geneCopyVarParentTwoList = [4; 5];

phenotypeVar = 3;

期望输出：

phenotypeFactorSigmoid = struct('var', [3,1,2,4,5], 'card', [2,2,2,2,2], 'val', [0.999590432835014,0.000409567164986080,0.858148935099512,0.141851064900488,0.997762151478724,0.00223784852127629,0.524979187478940,0.475020812521060,0.858148935099512,0.141851064900488,0.0147740316932731,0.985225968306727,0.524979187478940,0.475020812521060,0.00273196076301106,0.997268039236989,0.997762151478724,0.00223784852127629,0.524979187478940,0.475020812521060,0.987871565015726,0.0121284349842742,0.167981614866076,0.832018385133925,0.524979187478940,0.475020812521060,0.00273196076301106,0.997268039236989,0.167981614866076,0.832018385133925,0.000500201107079564,0.999499798892920]);

嘛嘛，注意下基因排列的顺序就好啦~（我才不会说我看错顺序了……）

参考代码如下：

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% INSERT YOUR CODE HERE

% Note that computeSigmoid.m will be useful for this function.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

phenotypeFactor.var = [phenotypeVar, geneCopyVarOneList', geneCopyVarTwoList'];

phenotypeFactor.card = [2, repmat(length(alleleWeights{1}), [1, length(geneCopyVarOneList)]), ...

                        repmat(length(alleleWeights{2}), [1, length(geneCopyVarTwoList)])];



phenotypeFactor.val = zeros(1, prod(phenotypeFactor.card));

assignments = IndexToAssignment(1 : prod(phenotypeFactor.card), phenotypeFactor.card);



for ii = 1 : prod(phenotypeFactor.card)

    sumWeights = 0;

    for k1 = 1 : length(geneCopyVarOneList')

        sumWeights = sumWeights + alleleWeights{k1}(assignments(ii, k1 + 1));

    end

    for k2 = 1 : length(geneCopyVarTwoList')

        sumWeights = sumWeights + alleleWeights{k2}(assignments(ii, k2 + length(geneCopyVarOneList') + 1));

    end

    if assignments(ii, 1) == 1

        phenotypeFactor.val(ii) = computeSigmoid(sumWeights);

    else

        phenotypeFactor.val(ii) = 1 - computeSigmoid(sumWeights);

    end

end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

最后附上成绩截图~哦哈哈哈哈哈……好像也没啥了不起(0.0‘)，继续加油~

小石学CS

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Coursera概率图模型（Probabilistic Graphical Models）第二周编程作业分析

Bayes Nets for Genetic Inheritance基因遗传的贝叶斯网络 1.构建基因遗传的贝叶斯网络本章要求构建如下图所示的贝叶斯网络：图中，变量1、2、3分别表示父母及子女的基因型（Genotype），变量4、5、6分别表示父母及子女基因型所对应的性状（Phenotype）。同时，基因型本身由等位基因（Allele）决定。图中的三个虚线...
复制链接

扫一扫

专栏目录