Counting Unique k-mers -- My First Go Program

Previously, I wrote a Perl program to count the number of unique k-mers. It is very convenient to implement it in Perl, because Perl supports hash-of-hashes(which could dynamically count distinct k-mers) and sorting hashes by value(which can easily list the occurrences of unique k-mers in descending order).

Unfortunately, comparing to C++, Perl program usually costs more memory and longer time. This is not a big issue when the FASTA files are not large(smaller than 400MB). However, when handling larger FASTA files, my Perl program would require more than 8GB memory, which exceeds the limits of my computer. Therefore, I decided to rewrite the program by Go, a system language that supports concurrent features and can has C/C++-comparable speed.

The syntax of Go is very different from C/C++, which takes me a whole day familiarize. Some operations that can be implemented easily in Perl can only be done in complicated way, such as calling system command, reading file line-by-line, writing file and sorting map by value. Since this small tool doesn't involve network and concurrency, I cannot compare the difficulty of writing Go program to implement such work. 

Although the experience of the first Go program is not happy, I still think it is necessary to continue learning Go language, because Go program really runs very fast and saves memory(comparing to Perl script). And you don't need to worry about the tricky part of C++. 

// kmerfreq - count unique k-mers from a fasta file.

package main

import (
	"bufio"
	"bytes"
	"flag"
	"fmt"
	"io"
	"os"
	"os/exec"
	"strings"
	"strconv"
)

// Readln returns a single line (without the ending \n)
// from the input buffered reader.
// An error is returned iff there is an error with the
// buffered reader.
func Readln(r *bufio.Reader) (string, error) {
	var (
		isPrefix bool  = true
		err      error = nil
		line, ln []byte
	)
	for isPrefix && err == nil {
		line, isPrefix, err = r.ReadLine()
		ln = append(ln, line...)
	}
	return string(ln), err
}

func main() {
	// Get options
	var input = flag.String("i", "", "input fasta file")
	var kmer = flag.Int("k", 16, "length of k-mer")
	flag.Parse()

	// Dynamically parse the input file name
	cmd := exec.Command("basename", *input)
	cmd.Stdin = strings.NewReader("some input")
	var name bytes.Buffer
	cmd.Stdout = &name
	err := cmd.Run()
	if err != nil {
		fmt.Println(err)
		return
	}
	var output = flag.String("o", fmt.Sprintf("%s.%d-mer", strings.TrimSpace(name.String()), *kmer), "output file")
	flag.Parse()

	// open input file
	fi, err := os.Open(*input)
	if err != nil {
		panic(err)
	}
	// close fi on exit and check for its returned error
	defer func() {
		if err := fi.Close(); err != nil {
			panic(err)
		}
	}()

	// open output file
	fo, err := os.Create(*output)
	if err != nil {
		panic(err)
	}
	// close fo on exit and check for its returned error
	defer func() {
		if err := fo.Close(); err != nil {
			panic(err)
		}
	}()

	// Read and count unique k-mers
	var read string = ""
	var km map[string]int  // k-mer counts
	km = make(map[string]int)

	r := bufio.NewReader(fi)
	for {
		line, err := Readln(r)
		if err == io.EOF {
			if !strings.HasPrefix(line, ">") {
				read += line
			}
			break // done
		} else if err != nil {
			panic(err) // error happens
		}
		if strings.HasPrefix(line, ">") {
			// Header of the read
			if len(read) != 0 {
				// count k-mers
				for i := 0; i < len(read)-*kmer+1; i++ {
					if km[read[i:i+*kmer]] != 0 {
						km[read[i:i+*kmer]]++
					} else {
						km[read[i:i+*kmer]] = 1
					}
				}
				read = ""
			}
		} else {
			// Read
			read += line
		}
	} // end of for

	// Don't forget the last read
	if len(read) != 0 {
		// count k-mers
		for i := 0; i < len(read)-*kmer+1; i++ {
			if km[read[i:i+*kmer]] != 0 {
				km[read[i:i+*kmer]]++
			} else {
				km[read[i:i+*kmer]] = 1
			}
		}
	}

	// Write results
	for k, v := range km {
		fo.WriteString(k + "\t" + strconv.Itoa(v) + "\n")
	}
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
boosting-crowd-counting-via-multifaceted-attention是一种通过多方面注意力提升人群计数的方法。该方法利用了多个方面的特征来准确估计人群数量。 在传统的人群计数方法中,往往只关注人群的整体特征,而忽略了不同区域的细节。然而,不同区域之间的人群密度可能存在差异,因此细致地分析这些区域是非常重要的。 该方法首先利用卷积神经网络(CNN)提取图像的特征。然后,通过引入多个注意力机制,分别关注图像的局部细节、稀疏区域和密集区域。 首先,该方法引入了局部注意力机制,通过对图像的局部区域进行加权来捕捉人群的局部特征。这使得网络能够更好地适应不同区域的密度变化。 其次,该方法采用了稀疏区域注意力机制,它能够识别图像中的稀疏区域并将更多的注意力放在这些区域上。这是因为稀疏区域往往是需要重点关注的区域,因为它们可能包含有人群密度的极端变化。 最后,该方法还引入了密集区域注意力机制,通过提取图像中人群密集的区域,并将更多的注意力放在这些区域上来准确估计人群数量。 综上所述,boosting-crowd-counting-via-multifaceted-attention是一种通过引入多个注意力机制来提高人群计数的方法。它能够从不同方面细致地分析图像,并利用局部、稀疏和密集区域的特征来准确估计人群数量。这个方法通过考虑人群分布的细节,提供了更精确的人群计数结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值