htsjdk库Allele接口及相关类介绍

在 HTSJDK 库中,Allele 是一个接口,用于表示基因组中的等位基因。Allele 接口定义了处理等位基因的基本方法和属性,而具体的实现类则实现了这些接口方法,以提供等位基因的实际功能和行为。

Allele 接口

Allele 接口主要用于表示基因组中一个位置的等位基因。等位基因可以是参考等位基因,也可以是替代等位基因。该接口提供了一些方法来获取等位基因的信息和进行比较操作。

主要功能
  • 表示等位基因Allele 接口用于表示单个等位基因的序列和类型。
  • 比较等位基因:提供方法来比较等位基因是否相等或是否为参考等位基因。
源码:
/*
 * Copyright (c) 2012 The Broad Institute
 *
 * Permission is hereby granted, free of charge, to any person
 * obtaining a copy of this software and associated documentation
 * files (the "Software"), to deal in the Software without
 * restriction, including without limitation the rights to use,
 * copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following
 * conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
 * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
 * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
 * THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

package htsjdk.variant.variantcontext;

import java.io.Serializable;
import java.nio.charset.StandardCharsets;

/**
 * Immutable representation of an allele.
 *<p>
 * Types of alleles:
 *</p>
 *<pre>
 Ref: a t C g a // C is the reference base
 : a t G g a // C base is a G in some individuals
 : a t - g a // C base is deleted w.r.t. the reference
 : a t CAg a // A base is inserted w.r.t. the reference sequence
 </pre>
 *<p> In these cases, where are the alleles?</p>
 *<ul>
 * <li>SNP polymorphism of C/G  -&gt; { C , G } -&gt; C is the reference allele</li>
 * <li>1 base deletion of C     -&gt; { tC , t } -&gt; C is the reference allele and we include the preceding reference base (null alleles are not allowed)</li>
 * <li>1 base insertion of A    -&gt; { C ; CA } -&gt; C is the reference allele (because null alleles are not allowed)</li>
 *</ul>
 *<p>
 * Suppose I see a the following in the population:
 *</p>
 *<pre>
 Ref: a t C g a // C is the reference base
 : a t G g a // C base is a G in some individuals
 : a t - g a // C base is deleted w.r.t. the reference
 </pre>
 * <p>
 * How do I represent this?  There are three segregating alleles:
 * </p>
 *<blockquote>
 *  { C , G , - }
 *</blockquote>
 *<p>and these are represented as:</p>
 *<blockquote>
 *  { tC, tG, t }
 *</blockquote>
 *<p>
 * Now suppose I have this more complex example:
 </p>
 <pre>
 Ref: a t C g a // C is the reference base
 : a t - g a
 : a t - - a
 : a t CAg a
 </pre>
 * <p>
 * There are actually four segregating alleles:
 * </p>
 *<blockquote>
 *   { Cg , -g, --, and CAg } over bases 2-4
 *</blockquote>
 *<p>   represented as:</p>
 *<blockquote>
 *   { tCg, tg, t, tCAg }
 *</blockquote>
 *<p>
 * Critically, it should be possible to apply an allele to a reference sequence to create the
 * correct haplotype sequence:</p>
 *<blockquote>
 * Allele + reference =&gt; haplotype
 *</blockquote>
 *<p>
 * For convenience, we are going to create Alleles where the GenomeLoc of the allele is stored outside of the
 * Allele object itself.  So there's an idea of an A/C polymorphism independent of it's surrounding context.
 *
 * Given list of alleles it's possible to determine the "type" of the variation
 </p>
 <pre>
 A / C @ loc =&gt; SNP
 - / A =&gt; INDEL
 </pre>
 * <p>
 * If you know where allele is the reference, you can determine whether the variant is an insertion or deletion.
 * </p>
 * <p>
 * Alelle also supports is concept of a NO_CALL allele.  This Allele represents a haplotype that couldn't be
 * determined. This is usually represented by a '.' allele.
 * </p>
 * <p>
 * Note that Alleles store all bases as bytes, in **UPPER CASE**.  So 'atc' == 'ATC' from the perspective of an
 * Allele.
 * </p>
 * @author gatk_team.
 */
public interface Allele extends Comparable<Allele>, Serializable {

    /** A generic static NO_CALL allele for use */
    String NO_CALL_STRING = ".";
    /** A generic static SPAN_DEL allele for use */
    String SPAN_DEL_STRING = "*";
    /** Non ref allele representations */

    char SINGLE_BREAKEND_INDICATOR = '.';
    char BREAKEND_EXTENDING_RIGHT = '[';
    char BREAKEND_EXTENDING_LEFT = ']';
    char SYMBOLIC_ALLELE_START = '<';
    char SYMBOLIC_ALLELE_END = '>';


    String NON_REF_STRING = "<NON_REF>";
    String UNSPECIFIED_ALTERNATE_ALLELE_STRING = "<*>";
    Allele REF_A = new SimpleAllele("A", true);
    Allele ALT_A = new SimpleAllele("A", false);
    Allele REF_C = new SimpleAllele("C", true);
    Allele ALT_C = new SimpleAllele("C", false);
    Allele REF_G = new SimpleAllele("G", true);
    Allele ALT_G = new SimpleAllele("G", false);
    Allele REF_T = new SimpleAllele("T", true);
    Allele ALT_T = new SimpleAllele("T", false);
    Allele REF_N = new SimpleAllele("N", true);
    Allele ALT_N = new SimpleAllele("N", false);
    Allele SPAN_DEL = new SimpleAllele(SPAN_DEL_STRING, false);
    Allele NO_CALL = new SimpleAllele(NO_CALL_STRING, false);
    Allele NON_REF_ALLELE = new SimpleAllele(NON_REF_STRING, false);
    Allele UNSPECIFIED_ALTERNATE_ALLELE = new SimpleAllele(UNSPECIFIED_ALTERNATE_ALLELE_STRING, false);

    // for simple deletion, e.g. "ALT==<DEL>" (note that the spec allows, for now at least, alt alleles like <DEL:ME>)
    @SuppressWarnings("unused")
    Allele SV_SIMPLE_DEL = StructuralVariantType.DEL.toSymbolicAltAllele();
    // for simple insertion, e.g. "ALT==<INS>"
    @SuppressWarnings("unused")
    Allele SV_SIMPLE_INS = StructuralVariantType.INS.toSymbolicAltAllele();
    // for simple inversion, e.g. "ALT==<INV>"
    @SuppressWarnings("unused")
    Allele SV_SIMPLE_INV = StructuralVariantType.INV.toSymbolicAltAllele();
    // for simple generic cnv, e.g. "ALT==<CNV>"
    @SuppressWarnings("unused")
    Allele SV_SIMPLE_CNV = StructuralVariantType.CNV.toSymbolicAltAllele();
    // for simple duplication, e.g. "ALT==<DUP>"
    @SuppressWarnings("unused")
    Allele SV_SIMPLE_DUP = StructuralVariantType.DUP.toSymbolicAltAllele();

    /**
     * Create a new Allele that includes bases and if tagged as the reference allele if isRef == true.  If bases
     * == '-', a Null allele is created.  If bases ==  '.', a no call Allele is created. If bases ==  '*', a spanning deletions Allele is created.
     *
     * @param bases the DNA sequence of this variation, '-', '.', or '*'
     * @param isRef should we make this a reference allele?
     * @throws IllegalArgumentException if bases contains illegal characters or is otherwise malformated
     */
    static Allele create(byte[] bases, boolean isRef) {
        if ( bases == null )
            throw new IllegalArgumentException("create: the 
  • 5
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值