In the daylight of biotechnology:Rcpp and R applications

Scientific tutorials and data application

Monday, June 13, 2016

Fast kmer counting table algorithm using perfect hash function: C++ pseudo-code integration into R using Rcpp API


Abstract



Counting kmers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We proposed a simple algorithm to calculate the kmer count using perfect hash table implemented in C++ and using Rcpp API to be able exported into R. The pdf version is available at: Fast kmer counting table algorithm using perfect hash function: C++ pseudo-code integration into R using Rcpp API




References

1.            Deorowicz, S., A. Debudaj-Grabysz, and S. Grabowski, Disk-based k-mer counting on a PC. BMC bioinformatics, 2013. 14(1): p. 1.
2.            Melsted, P. and J.K. Pritchard, Efficient counting of k-mers in DNA sequences using a bloom filter. BMC bioinformatics, 2011. 12(1): p. 1.
3.            Zhang, Q., et al., These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PloS one, 2014. 9(7): p. e101271.
4.            Pages, H., et al., String objects representing biological sequences, and matching algorithms. R package version, 2009. 2(2).



R & R à 8:40:00 AM No comments:
Share
‹
›
Home
View web version
Powered by Blogger.