htsjdk SamReader接口介绍

SamReader 是 htsjdk 库中的一个接口,用于读取和解析 SAM(Sequence Alignment/Map)和 BAM(Binary Alignment/Map)格式的文件。htsjdk 是一个广泛使用的 Java 库,提供了处理高通量测序数据的工具,SamReader 是其中的一个核心接口。

SamReader 接口介绍

SamReader 主要用于读取和迭代 SAM/BAM 文件中的记录(即比对信息)。它的设计使得用户可以方便地处理大规模的测序数据,同时支持随机访问、顺序读取、查询指定区间的数据等操作。

主要功能包括:

  • 顺序读取:逐条读取文件中的比对记录。
  • 区间查询:通过给定的染色体位置区间,获取对应的比对记录。
  • 元数据访问:获取 SAM/BAM 文件中的头信息(如参考序列、比对工具版本等)。
  • 文件索引支持:支持使用 BAM 索引文件(.bai)进行快速的区域查询。

SamReader内部接口/内部类

内部类 Type
  • 用于描述 SAM 文件的类型,例如 BAM、CRAM、SAM 等。
  • 每种类型都定义了一个文件扩展名和(可选的)索引扩展名。
内部接口 Indexing
  • 处理与索引相关的操作,例如获取索引、判断是否有可浏览的索引等。
PrimitiveSamReader 接口
  • 这是一个简化版的 SamReader,仅包含最基本的功能。通过 PrimitiveSamReaderToSamReaderAdapter 类可以将其转化为 SamReader
PrimitiveSamReaderToSamReaderAdapter 类
  • 这是一个适配器类,用于将 PrimitiveSamReader 转换为 SamReader
  • 该类实现了 SamReader 和 Indexing 接口,并且提供了一些辅助方法,例如 queryMate,用于查询配对读(paired read)的伴读(mate)。
   ReaderImplementation抽象类
  •         abstract class ReaderImplementation implements PrimitiveSamReader
  •         继承类为:

SamReader源码

/*
 * The MIT License
 *
 * Copyright (c) 2016 The Broad Institute
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

package htsjdk.samtools;

import htsjdk.samtools.util.CloseableIterator;

import java.io.Closeable;
import java.text.MessageFormat;

/**
 * Describes functionality for objects that produce {@link SAMRecord}s and associated information.
 *
 * Currently, only deprecated readers implement this directly; actual readers implement this
 * via {@link ReaderImplementation} and {@link PrimitiveSamReader}, which {@link SamReaderFactory}
 * converts into full readers by using {@link PrimitiveSamReaderToSamReaderAdapter}.
 *
 * @author mccowan
 */
public interface SamReader extends Iterable<SAMRecord>, Closeable {

    /** Describes a type of SAM file. */
    public abstract class Type {
        /** A string representation of this type. */
        public abstract String name();

        /** The recommended file extension for SAMs of this type, without a period. */
        public abstract String fileExtension();

        /** The recommended file extension for SAM indexes of this type, without a period, or null if this type is not associated with indexes. */
        public abstract String indexExtension();

        static class TypeImpl extends Type {
            final String name, fileExtension, indexExtension;

            TypeImpl(final String name, final String fileExtension, final String indexExtension) {
                this.name = name;
                this.fileExtension = fileExtension;
                this.indexExtension = indexExtension;
            }

            @Override
            public String name() {
                return name;
            }

            @Override
            public String fileExtension() {
                return fileExtension;
            }

            @Override
            public String indexExtension() {
                return indexExtension;
            }

            @Override
            public String toString() {
                return String.format("TypeImpl{name='%s', fileExtension='%s', indexExtension='%s'}", name, fileExtension, indexExtension);
            }
        }

        public static final Type SRA_TYPE = new TypeImpl("SRA", "sra", null);
        public static final Type CRAM_TYPE = new TypeImpl("CRAM", "cram", "crai");
        public static final Type BAM_TYPE = new TypeImpl("BAM", "bam", "bai");
        public static final Type SAM_TYPE = new TypeImpl("SAM", "sam", null);
        public static final Type BAM_CSI_TYPE = new TypeImpl("BAM", "bam", "csi");
        public static final Type BAM_HTSGET_TYPE = new TypeImpl("BAM", "bam", null);

        public boolean hasValidFileExtension(final String fileName) {
            return fileName != null && fileName.endsWith("." + fileExtension());
        }
    }

    /**
     * Facet for index-related operations.
     */
    public interface Indexing {
        /**
         * Retrieves the index for the given file type.  Ensure that the index is of the specified type.
         *
         * @return An index of the given type.
         */
        public BAMIndex getIndex();

        /**
         * Returns true if the supported index is browseable, meaning the bins in it can be traversed
         * and chunk data inspected and retrieved.
         *
         * @return True if the index supports the BrowseableBAMIndex interface.  False otherwise.
         */
        public boolean hasBrowseableIndex();

        /**
         * Gets an index tagged with the BrowseableBAMIndex interface.  Throws an exception if no such
         * index is available.
         *
         * @return An index with a browseable interface, if possible.
         * @throws SAMException if no such index is available.
         */
        public BrowseableBAMIndex getBrowseableIndex();

        /**
         * Iterate through the given chunks in the file.
         *
         * @param chunks List of chunks for which to retrieve data.
         * @return An iterator over the given chunks.
         */
        public SAMRecordIterator iterator(final SAMFileSpan chunks);

        /**
         * Gets a pointer spanning all reads in the BAM file.
         *
         * @return Unbounded pointer to the first record, in chunk format.
         */
        public SAMFileSpan getFilePointerSpanningReads();

    }

    public SAMFileHeader getFileHeader();

    /**
     * @return the {@link htsjdk.samtools.SamReader.Type} of this {@link htsjdk.samtools.SamReader}
     */
    public Type type();

    /**
     * @return a human readable description of the resource backing this sam reader
     */
    public String getResourceDescription();

    /**
     * @return true if this source can be queried by interval, regardless of whether it has an index
     */
    default public boolean isQueryable() {
        return this.hasIndex();
    }

    /**
     * @return true if ths is a BAM file, and has an index
     */
    public boolean hasIndex();

    /**
     * Exposes the {@link SamReader.Indexing} facet of this {@link SamReader}.
     *
     * @throws java.lang.UnsupportedOperationException If {@link #hasIndex()} returns false.
     */
    public Indexing indexing();

    /**
     * Iterate through file in order.  For a SamReader constructed from an InputStream, and for any SAM file,
     * a 2nd iteration starts where the 1st one left off.  For a BAM constructed from a SeekableStream or File, each new iteration
     * starts at the first record.
     * <p/>
     * Only a single open iterator on a SAM or BAM file may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.
     */
    @Override
    public SAMRecordIterator iterator();

    /**
     * Iterate over records that match the given interval.  Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.  You can use a second SamReader to iterate
     * in parallel over the same underlying file.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match the interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * is in the query region.
     *
     * @param sequence  Reference sequence of interest.
     * @param start     1-based, inclusive start of interval of interest. Zero implies start of the reference sequence.
     * @param end       1-based, inclusive end of interval of interest. Zero implies end of the reference sequence.
     * @param contained If true, each SAMRecord returned will have its alignment completely contained in the
     *                  interval of interest.  If false, the alignment of the returned SAMRecords need only overlap the interval of interest.
     * @return Iterator over the SAMRecords matching the interval.
     */
    public SAMRecordIterator query(final String sequence, final int start, final int end, final boolean contained);

    /**
     * Iterate over records that overlap the given interval.  Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match the interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * is in the query region.
     *
     * @param sequence Reference sequence of interest.
     * @param start    1-based, inclusive start of interval of interest. Zero implies start of the reference sequence.
     * @param end      1-based, inclusive end of interval of interest. Zero implies end of the reference sequence.
     * @return Iterator over the SAMRecords overlapping the interval.
     */
    public SAMRecordIterator queryOverlapping(final String sequence, final int start, final int end);

    /**
     * Iterate over records that are contained in the given interval.  Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match the interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * is in the query region.
     *
     * @param sequence Reference sequence of interest.
     * @param start    1-based, inclusive start of interval of interest. Zero implies start of the reference sequence.
     * @param end      1-based, inclusive end of interval of interest. Zero implies end of the reference sequence.
     * @return Iterator over the SAMRecords contained in the interval.
     */
    public SAMRecordIterator queryContained(final String sequence, final int start, final int end);

    /**
     * Iterate over records that match one of the given intervals.  This may be more efficient than querying
     * each interval separately, because multiple reads of the same SAMRecords is avoided.
     * <p/>
     * Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.  You can use a second SamReader to iterate
     * in parallel over the same underlying file.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match an interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * is in the query region.
     *
     * @param intervals Intervals to be queried.  The intervals must be optimized, i.e. in order, with overlapping
     *                  and abutting intervals merged.  This can be done with {@link htsjdk.samtools.QueryInterval#optimizeIntervals}
     * @param contained If true, each SAMRecord returned is will have its alignment completely contained in one of the
     *                  intervals of interest.  If false, the alignment of the returned SAMRecords need only overlap one of
     *                  the intervals of interest.
     * @return Iterator over the SAMRecords matching the interval.
     */
    public SAMRecordIterator query(final QueryInterval[] intervals, final boolean contained);

    /**
     * Iterate over records that overlap any of the given intervals.  This may be more efficient than querying
     * each interval separately, because multiple reads of the same SAMRecords is avoided.
     * <p/>
     * Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match the interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * is in the query region.
     *
     * @param intervals Intervals to be queried.  The intervals must be optimized, i.e. in order, with overlapping
     *                  and abutting intervals merged.  This can be done with {@link htsjdk.samtools.QueryInterval#optimizeIntervals}
     */
    public SAMRecordIterator queryOverlapping(final QueryInterval[] intervals);

    /**
     * Iterate over records that are contained in the given interval.  This may be more efficient than querying
     * each interval separately, because multiple reads of the same SAMRecords is avoided.
     * <p/>
     * Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match the interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * is in the query region.
     *
     * @param intervals Intervals to be queried.  The intervals must be optimized, i.e. in order, with overlapping
     *                  and abutting intervals merged.  This can be done with {@link htsjdk.samtools.QueryInterval#optimizeIntervals}
     * @return Iterator over the SAMRecords contained in any of the intervals.
     */
    public SAMRecordIterator queryContained(final QueryInterval[] intervals);


    public SAMRecordIterator queryUnmapped();

    /**
     * Iterate over records that map to the given sequence and start at the given position.  Only valid to call this if hasIndex() == true.
     * <p/>
     * Only a single open iterator on a given SamReader may be extant at any one time.  If you want to start
     * a second iteration, the first one must be closed first.
     * <p/>
     * Note that indexed lookup is not perfectly efficient in terms of disk I/O.  I.e. some SAMRecords may be read
     * and then discarded because they do not match the interval of interest.
     * <p/>
     * Note that an unmapped read will be returned by this call if it has a coordinate for the purpose of sorting that
     * matches the arguments.
     *
     * @param sequence Reference sequence of interest.
     * @param start    Alignment start of interest.
     * @return Iterator over the SAMRecords with the given alignment start.
     */
    public SAMRecordIterator queryAlignmentStart(final String sequence, final int start);

    /**
     * Fetch the mate for the given read.  Only valid to call this if hasIndex() == true.
     * This will work whether the mate has a coordinate or not, so long as the given read has correct
     * mate information.  This method iterates over the SAM file, so there may not be an unclosed
     * iterator on the SAM file when this method is called.
     * <p/>
     * Note that it is not possible to call queryMate when iterating over the SamReader, because queryMate
     * requires its own iteration, and there cannot be two simultaneous iterations on the same SamReader.  The
     * work-around is to open a second SamReader on the same input file, and call queryMate on the second
     * reader.
     *
     * @param rec Record for which mate is sought.  Must be a paired read.
     * @return rec's mate, or null if it cannot be found.
     */
    public SAMRecord queryMate(final SAMRecord rec);

    /**
     * The minimal subset of functionality needed for a {@link SAMRecord} data source.
     * {@link SamReader} itself is somewhat large and bulky, but the core functionality can be captured in
     * relatively few methods, which are included here. For documentation, see the corresponding methods
     * in {@link SamReader}.
     *
     * See also: {@link PrimitiveSamReaderToSamReaderAdapter}, {@link ReaderImplementation}
     *
     */
    public interface PrimitiveSamReader {
        Type type();

        default boolean isQueryable() {
            return this.hasIndex();
        }

        boolean hasIndex();

        BAMIndex getIndex();

        SAMFileHeader getFileHeader();

        CloseableIterator<SAMRecord> getIterator();

        CloseableIterator<SAMRecord> getIterator(SAMFileSpan fileSpan);

        SAMFileSpan getFilePointerSpanningReads();

        CloseableIterator<SAMRecord> query(QueryInterval[] intervals, boolean contained);

        CloseableIterator<SAMRecord> queryAlignmentStart(String sequence, int start);

        CloseableIterator<SAMRecord> queryUnmapped();

        void close();

        ValidationStringency getValidationStringency();
    }

    /**
     * Decorator for a {@link SamReader.PrimitiveSamReader} that expands its functionality into a {@link SamReader},
     * given the backing {@link SamInputResource}.
     *
     * Wraps the {@link Indexing} interface as well, which was originally separate from {@link SamReader} but in practice
     * the two are always implemented by the same class.
     *
     */
    class PrimitiveSamReaderToSamReaderAdapter implements SamReader, Indexing {
        final PrimitiveSamReader p;
        final SamInputResource resource;

        public PrimitiveSamReaderToSamReaderAdapter(final PrimitiveSamReader p, final SamInputResource resource) {
            this.p = p;
            this.resource = resource;
        }

        /**
         * Access the underlying {@link PrimitiveSamReader} used by this adapter.
         * @return the {@link PrimitiveSamReader} used by this adapter.
         */
        public PrimitiveSamReader underlyingReader() {
            return p;
        }

        @Override
        public SAMRecordIterator queryOverlapping(final String sequence, final int start, final int end) {
            return query(sequence, start, end, false);
        }

        @Override
        public SAMRecordIterator queryOverlapping(final QueryInterval[] intervals) {
            return query(intervals, false);
        }

        @Override
        public SAMRecordIterator queryContained(final String sequence, final int start, final int end) {
            return query(sequence, start, end, true);
        }

        @Override
        public SAMRecordIterator queryContained(final QueryInterval[] intervals) {
            return query(intervals, true);
        }

        /**
         * Wraps the boilerplate code for querying a record's mate, which is common across many implementations.
         *
         * @param rec Record for which mate is sought.  Must be a paired read.
         * @return
         */
        @Override
        public SAMRecord queryMate(final SAMRecord rec) {
            if (!rec.getReadPairedFlag()) {
                throw new IllegalArgumentException("queryMate called for unpaired read.");
            }
            if (rec.getFirstOfPairFlag() == rec.getSecondOfPairFlag()) {
                throw new IllegalArgumentException("SAMRecord must be either first and second of pair, but not both.");
            }
            final boolean firstOfPair = rec.getFirstOfPairFlag();
            final CloseableIterator<SAMRecord> it;
            if (rec.getMateReferenceIndex() == SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX) {
                it = queryUnmapped();
            } else {
                it = queryAlignmentStart(rec.getMateReferenceName(), rec.getMateAlignmentStart());
            }
            try {
                SAMRecord mateRec = null;
                while (it.hasNext()) {
                    final SAMRecord next = it.next();
                    if (!next.getReadPairedFlag()) {
                        if (rec.getReadName().equals(next.getReadName())) {
                            throw new SAMFormatException("Paired and unpaired reads with same name: " + rec.getReadName());
                        }
                        continue;
                    }
                    if (firstOfPair) {
                        if (next.getFirstOfPairFlag()) continue;
                    } else {
                        if (next.getSecondOfPairFlag()) continue;
                    }
                    if (rec.getReadName().equals(next.getReadName())) {
                        if (mateRec != null) {
                            throw new SAMFormatException("Multiple SAMRecord with read name " + rec.getReadName() +
                                    " for " + (firstOfPair ? "second" : "first") + " end.");
                        }
                        mateRec = next;
                    }
                }
                return mateRec;
            } finally {
                it.close();
            }
        }

        @Override
        public boolean hasBrowseableIndex() {
            return hasIndex() && getIndex() instanceof BrowseableBAMIndex;
        }

        @Override
        public BrowseableBAMIndex getBrowseableIndex() {
            final BAMIndex index = getIndex();
            if (!(index instanceof BrowseableBAMIndex))
                throw new SAMException("Cannot return index: index created by BAM is not browseable.");
            return BrowseableBAMIndex.class.cast(index);
        }

        @Override
        public SAMRecordIterator iterator() {
            return new AssertingIterator(p.getIterator());
        }

        @Override
        public SAMRecordIterator iterator(final SAMFileSpan chunks) {
            return new AssertingIterator(p.getIterator(chunks));
        }

        @Override
        public void close() {
            p.close();
        }

        @Override
        public SAMFileSpan getFilePointerSpanningReads() {
            return p.getFilePointerSpanningReads();
        }

        @Override
        public SAMFileHeader getFileHeader() {
            return p.getFileHeader();
        }

        @Override
        public Type type() {
            return p.type();
        }

        @Override
        public String getResourceDescription() {
            return this.resource.toString();
        }

        @Override
        public boolean isQueryable() {
            return p.isQueryable();
        }

        @Override
        public boolean hasIndex() {
            return p.hasIndex();
        }

        @Override
        public Indexing indexing() {
            return this;
        }

        @Override
        public BAMIndex getIndex() {
            return p.getIndex();
        }

        @Override
        public SAMRecordIterator query(final QueryInterval[] intervals, final boolean contained) {
            return AssertingIterator.of(p.query(intervals, contained));
        }

        @Override
        public SAMRecordIterator query(final String sequence, final int start, final int end, final boolean contained) {
            return query(new QueryInterval[]{new QueryInterval(getFileHeader().getSequenceIndex(sequence), start, end)}, contained);
        }

        @Override
        public SAMRecordIterator queryUnmapped() {
            return AssertingIterator.of(p.queryUnmapped());
        }

        @Override
        public SAMRecordIterator queryAlignmentStart(final String sequence, final int start) {
            return AssertingIterator.of(p.queryAlignmentStart(sequence, start));
        }

    }

    static class AssertingIterator implements SAMRecordIterator {

        static AssertingIterator of(final CloseableIterator<SAMRecord> iterator) {
            return new AssertingIterator(iterator);
        }

        private final CloseableIterator<SAMRecord> wrappedIterator;
        private SAMSortOrderChecker checker;

        public AssertingIterator(final CloseableIterator<SAMRecord> iterator) {
            wrappedIterator = iterator;
        }

        @Override
        public SAMRecordIterator assertSorted(final SAMFileHeader.SortOrder sortOrder) {
            checker = new SAMSortOrderChecker(sortOrder);
            return this;
        }

        @Override
        public SAMRecord next() {
            final SAMRecord result = wrappedIterator.next();
            if (checker != null) {
                final SAMRecord previous = checker.getPreviousRecord();
                if (!checker.isSorted(result)) {
                    throw new IllegalStateException(String.format(
                            "Record %s should come after %s when sorting with %s ordering.",
                            previous.getSAMString().trim(),
                            result.getSAMString().trim(), checker.getSortOrder()));
                }
            }
            return result;
        }

        @Override
        public void close() { wrappedIterator.close(); }

        @Override
        public boolean hasNext() { return wrappedIterator.hasNext(); }

        @Override
        public void remove() { wrappedIterator.remove(); }
    }

    /**
     * Internal interface for SAM/BAM/CRAM file reader implementations,
     * as distinct from non-file-based readers.
     *
     * Implemented as an abstract class to enforce better access control.
     *
     * TODO -- Many of these methods only apply for a subset of implementations,
     * TODO -- and either no-op or throw an exception for the others.
     * TODO -- We should consider refactoring things to avoid this;
     * TODO -- perhaps we can get away with not having this class at all.
     */
    abstract class ReaderImplementation implements PrimitiveSamReader {
        abstract void enableFileSource(final SamReader reader, final boolean enabled);

        abstract void enableIndexCaching(final boolean enabled);

        abstract void enableIndexMemoryMapping(final boolean enabled);

        abstract void enableCrcChecking(final boolean enabled);

        abstract void setSAMRecordFactory(final SAMRecordFactory factory);

        abstract void setValidationStringency(final ValidationStringency validationStringency);
    }
}

注:

在复杂的软件系统中,尤其是像处理 SAM/BAM 等复杂生物信息学数据的系统中,定义一个多层次、复杂的接口层次结构可以带来几个重要的好处:

  • 灵活性:通过使用多个内部接口和类,设计者可以在不改变公共 API 的情况下,轻松地扩展和修改内部实现。例如,如果未来需要对 PrimitiveSamReader 的行为进行重大更改,只需要修改它及相关的适配器类,而不必改变整个系统。

  • 分离关注点:复杂的接口设计可以帮助将不同的功能模块化。例如,将索引功能 (Indexing) 与基本的 SAM 读取功能 (PrimitiveSamReader) 分开,使得每个模块关注自己的职责。

  • 适配不同的实现:通过适配器模式(PrimitiveSamReaderToSamReaderAdapter),可以将一个接口(或实现)转化为另一个接口的实现,从而允许系统通过组合或扩展的方式来实现不同的功能组合。这种方式为代码的复用和扩展提供了便利。

  • 接口稳定性:定义复杂接口的目的是为了在未来扩展时尽量保持现有接口的稳定性和兼容性。这样,用户代码可以依赖于稳定的 API,而不必担心底层实现的变化。

总的来说,这种设计不仅是规范的,而且还遵循了面向对象设计中的几个重要原则,如单一职责原则、开闭原则和接口分离原则等。这种设计方法确保了代码的灵活性、可维护性和扩展性,特别是在像 htsjdk 这样复杂的库中。

BAMFileReader源码

/*
 * The MIT License
 *
 * Copyright (c) 2009 The Broad Institute
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */
package htsjdk.samtools;


import htsjdk.samtools.seekablestream.SeekableStream;
import htsjdk.samtools.util.*;
import htsjdk.samtools.util.zip.InflaterFactory;

import java.io.DataInputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import j
使用GATK的combinegvcf模块合并gvcf文件,可是到了这一步Using GATK jar /stor9000/apps/users/NWSUAF/2022050434/biosoft/gatk4.3/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /stor9000/apps/users/NWSUAF/2022050434/biosoft/gatk4.3/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar CombineGVCFs -R /stor9000/apps/users/NWSUAF/2008115251/genomes/ARS-UCD1.2_Btau5.0.1Y.fa --variant /stor9000/apps/users/NWSUAF/2020055419/home/xncattle/03.GVCF/01_out_GVCF/XN_22/1_XN_22.g.vcf.gz --variant /stor9000/apps/users/NWSUAF/2020055419/home/xncattle/03.GVCF/01_out_GVCF/XN_18/1_XN_18.g.vcf.gz -O /stor9000/apps/users/NWSUAF/2022050469/candy/bwa/gatk/Combine/chr1.g.vcf.gz 09:10:40.524 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/stor9000/apps/users/NWSUAF/2022050434/biosoft/gatk4.3/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 09:10:50.696 INFO CombineGVCFs - ------------------------------------------------------------ 09:10:50.697 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.3.0.0 09:10:50.697 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/ 09:10:50.698 INFO CombineGVCFs - Executing as 2022050469@node54 on Linux v3.10.0-1127.el7.x86_64 amd64 09:10:50.698 INFO CombineGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_72-b15 09:10:50.698 INFO CombineGVCFs - Start Date/Time: July 21, 2023 9:10:40 AM CST 09:10:50.698 INFO CombineGVCFs - ------------------------------------------------------------ 09:10:50.698 INFO CombineGVCFs - ------------------------------------------------------------ 09:10:50.698 INFO CombineGVCFs - HTSJDK Version: 3.0.1 09:10:50.699 INFO CombineGVCFs - Picard Version: 2.27.5 09:10:50.699 INFO CombineGVCFs - Built for Spark Version: 2.4.5 09:10:50.699 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2 09:10:50.699 INFO CombineGVCFs - HTSJDK Defa就停止了,没有输出文件,也没有报错文件
07-22
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值