We provide here a non-exhaustive list of implementations of the algorithms and data structures described in the book. We mainly list prototypes that are useful for pedagogical purposes. Sometimes we also include high-performance, mature codes that are ready to be used in genome-scale applications.

If you or your students implement some algorithms from the book, please let us know via email and we will link your code from this page. We do our best to assert that it is shared in good trust, but we do not fully check its efficiency or correctness.



Part I – Preliminaries

3. Data structures

3.1 Dynamic range minimum queries

3.2 Bitvector rank and select operations

3.3 Wavelet tree

4. Graphs

4.1 Directed acyclic graphs (DAGs)

4.1.1 Topological ordering

4.2 Arbitrary directed graphs

4.2.1 Eulerian paths

4.2.2 Shortest paths and the Bellman-Ford method

5. Network flows

5.2 Minimum-cost flows and circulations

5.2.2 A pseudo-polynomial algorithm

5.3 Bipartite matching problems

5.3.1 Perfect matching

5.4 Covering problems

5.4.2 Minimum path cover in a DAG




Part II – Fundamentals of Biological Sequence Analysis

6. Alignments

6.1 Edit distance

6.1.1 Edit distance computation

* 6.1.3 Myers bitparallel algorithm

6.2 Longest common subsequence

6.2.1 Sparse dynamic programming

6.3 Approximate string matching

6.4 Biological sequence alignment

6.4.1 Global alignment

6.4.2 Local alignment

6.4.4 Affine gap scores

6.5 Gene alignment

6.6 Multiple alignment

6.6.2 Dynamic programming

6.6.4 Progressive multiple alignment

6.6.5 DAG alignment

6.6.6 Jumping alignment

7. Hidden Markov models (HMMs)

7.2 The Viterbi algorithm

7.3 The forward and backward algorithms

7.4 Estimating HMM parameters




Part III – Genome-Scale Index Structures

8. Classical indexes

8.1 k-mer index

8.2 Suffix array

8.2.1 Suffix and string sorting

8.3 Suffix tree

8.3.2 Construction of the suffix tree

8.4 Applications of the suffix tree

8.4.1 Maximal repeats

8.4.2 Maximal unique matches

8.4.3 Document counting

9. Burrows-Wheeler indexes

9.2 BWT index

9.2.3 Succinct suffix array

9.3 Space-efficient construction of the BWT

9.4 Bidirectional BWT index

9.5 BWT index for labeled trees

9.6 BWT index for labeled DAGs

9.7 BWT indexes for de Bruijn graphs

9.7.1 Frequency-oblivious representation




Part IV – Genome-Scale Algorithms

10. Read alignment

10.1 Pattern partitioning

10.2 Dynamic programming along suffix tree paths

10.3 Backtracking on BWT indexes

10.3.1 Prefix pruning

10.3.2 Case analysis pruning with the bidirectional BWT index

10.4 Suffix filtering for approximate overlaps

10.6 Split alignment of reads

10.7 Alignment of reads to a pan-genome

10.7.1 Indexing a set of individual genomes

10.7.2 Indexing a reference genome and a set of variations

11. Genome analysis and comparison

11.1 Space-efficient genome analysis

11.1.1 Maximal repeats

11.1.2 Maximal unique matches

11.1.3 Maximal exact matches

11.2 Comparing genomes without alignment

11.2.1 Substring and k-mer kernels

* 11.2.2 Substring kernels with Markovian correction

11.2.3 Substring kernels and matching statistics

11.2.4 Mismatch kernels

11.2.5 Compression distance

12. Genome compression

12.2 Bit-optimal Lempel-Ziv compression

13. Fragment assembly

13.2 Contig assembly

13.2.1 Read error correction

13.2.3 Irreducible overlap graphs

13.3 Scaffolding

13.4 Gap filling




Part V – Applications

14. Genomics

14.1 Variation calling

14.1.1 Calling small variants

14.1.2 Calling large variants

14.2 Variation calling over pan-genomes

14.2.2 Alignments on the labeled DAG of a population

14.2.3 Evaluation of variation calling results

14.3 Haplotype assembly and phasing

15. Transcriptomics

15.1 Estimating the expression of annotated transcripts

15.2 Transcript assembly

15.2.1 Short reads

15.2.2 Long reads

15.2.3 Paired-end reads

15.3 Simultaneous assembly and expression estimation

15.4 Transcript alignment with co-linear chaining

16. Metagenomics

16.1 Species estimation

16.1.1 Single-read methods

16.1.2 Multi-read and coverage-sensitive methods

16.1.2 Multi-read and coverage-sensitive methods

16.2 Read clustering

16.3 Comparing metagenomic samples

16.3.1 Sequence-based methods

16.3.2 Read-based methods

16.3.3 Multi-sample methods