We maintain here the list of all typos and mistakes found after the book was published.

If you find a new mistake, please let us know via email and we will publish it here.

## 3. Data Structures

• Page 23, Example 3.1: Table second, value 11 should be 2 and value 13 should be 4. (observed at a study group of String Processing Algorithms course, Autumn 2018)
• Page 23, Example 3.1: Table third, row 100 should contain 111 rather than 100. (reported by Anna Kuosmanen)
• Page 23, line -5, definition of select: it should say maximal i such that rank(B,i)=j AND B[i]=1.
• Page 24, down poistions
• Page 26, lines 10-15: Indexes iv and iw become wrongly updated, as they shift to the left (outside the query interval), although they should shift to the right so that they remain inside the query interval. A fix is to use rank*(Bv, iv-1)+1 and rank*(Bw, iw-1)+1 to update the coordinates when going down.
• To help infer the idea of the correct algorithm, here is a simulation on word pääsiäinen.

## 4. Graphs

### 4.2.1 Eulerian paths

• Page 34, Corollary 4.4: the third bullet point |N-(v)| = |N-(v)| should be |N-(v)| = |N+(v)| (reported by Masahiro Kanai)

### Exercises

• Page 39, Exercise 4.16: c : V(G) → Qc : E(G) → Q
• Page 39, Exercise 4.16: that passes through at most t sets of the partition S ⟹ that changes the partite sets of S at most t times
• Page 39, Exercise 4.17: c : V(G) → Qc : E(G) → Q

## 5. Network flows

### 5.4 Covering problems

• Page 61, Figure 5.9(b): the edge from fout to gin should have cost €3, and the one from gout to hin should have cost €4 (see Figure 5.9(a))

## 6. Alignments

### 6.1.2 Shortest detour

• Page 76, line 24: in O(((n-m)+2x)-m) = O((t/delta)m), O(((n-m)+2x)-m) should be O(((n-m)+2x)m) (reported by Masahiro Kanai).

## 7. Hidden Markov models (HMMs)

### Insight 7.1

• Number 23 should be 22 through the example (number of non-coding emissions).

### Exercises

• Page 123, Exercise 7.4: S = sn...snS = s1...sn (reported by Rob Patro)

## 8. Classical Indexes

### 8.3 Suffix tree

• Page 141, Figure 8.4: in the suffix-link tree, the path from the root labelled by `GAGG` should be relabelled `GAGA`. The suffix-link tree should also contain an additional path from the root labelled by `GCGAGA#`.
• Page 144, line -6: a tree on n leaves ⟹ a tree of size O(n) on n leaves
• Page 144, proof of Lemma 8.17, serious flaw: The statement (with above fix) is correct, but the proof has a flaw: The proof omits the case when "(" is deleted, and pointers to it become invalid. This can be solved adding a union-find data structure, but the running time is no longer linear. However, there exist also correct linear time solutions: see Gabow & Tarjan, STOC 1983, or a recent simplified and practical solution Alzamel, Charalampopoulos, Iliopoulos, and Pissis, IWOCA 2017. Proof of Theorem 8.18 uses Lemma 8.17, but there is also a direct proof using properties of the suffix tree. Also Theorem 8.21 (document counting) uses Lemma 8.17. Another way to convince on the correctness of the claim of Lemma 8.17 is to read The LCA Problem Revisited by Bender and Farach-Colton for the online version of the problem. (reported by the authors)

### 8.4 Applications of the suffix tree

• Page 147, line 10: if V is the leaf ⟹ if v is the leaf (reported by Monika Balvočiūtė)

### 8.5 Literature

• Page 152, line -6: ... whose presentation we partly followed to introduce the repeat finding concepts. ⟹ Add the following sentence: Example applications of such concepts to genome analysis are long terminal repeat retrotransposons, k-mismatch repeats, and transposable element modules, which can be detected using maximal repeats as seeds (Ellinghaus et al. 2008, Rho et al. 2007, Volfovsky et al. 2001, Kurtz et al. 2001, Tempel et al. 2010), and ultraconserved elements, which correspond to maximal unique matches or to high-scoring alignments across multiple genomes (see e.g. Bejerano et al. 2004, Isenbarger et al. 2008).
• Page 153, line 4: Crochemore & Vérin (1997b) ⟹ Blumer et al. (1987)
• Page 153, line 5: Apostolico & Lonardi (2002) and Crochemore & Vérin (1997a) ⟹ Gusfield (1997), Crochemore & Vérin (1997a), and Crochemore & Vérin (1997b)

## 9. Burrows-Wheeler indexes

### 9.7.1 Frequency-oblivious representation

• Page 192, line 4: end in a vertex prefixed by c1W ⟹ end in vertex c1W

### 9.7.3 Space-efficient construction

• Page 193, line -18: Both the frequency-aware and the frequency-oblivious representations of DBGT#,k for some string T ⟹ Both the frequency-aware and the frequency-oblivious representations of the de Bruijn graph of a string T

### 9.8 Literature

• Page 195, line -17: is due to Kulekci et al. (2012). ⟹ is described in Kulekci et al. (2012) and in Okanohara et al. (2009).

### Exercises

• Exercise 9.7: `extendRight()` and `enumerateRight()` are not used by Algorithm 9.3, so there is no need to consider `enumerateRightExtended()` (reported by Jarno Alanko).

## 11. Genome analysis and comparison

### 11.1 Space-efficient genome analysis

• Page 224, Algorithm 11.2, lines 12,13,14,18: replace I with P (reported by Jarno Alanko).
• Page 228, Algorithm 11.3, line 5: the line could be shortened to: "if j1 < i1 then" (reported by Monika Balvočiūtė)

### 11.2 Comparing genomes without alignment

• Page 237, line 4: the value of the k-mer kernel is ⟹ the value of the kernel is

### 11.3 Literature

• Page 255, line -8: The space-efficient computation ... (2014). ⟹ Add the following sentence: The computation of the same kernels on generalized suffix trees is described in (Rieck et al. 2006, Rieck & Laskov 2008).
• Page 256, line 4: ... and a geometric treatment... is described in Benham et al. (2013). ⟹ Add the following sentence: Generalizations to measure the similarity of all strings in a set, and applications to regulatory DNA, are described in Ren et al. (2013).
• Page 256, line 15: The connection between ... was established in Apostolico & Bejerano (2000). ⟹ Add the following sentence: Probabilistic suffix tries were then implemented using lazy suffix trees and enhanced suffix arrays (Lin et al. 2012; Schulz et al. 2008).

### Exercises

• Page 257, Exercise 11.7 (b): σk+1σk+1 - σk + 1 (reported by Jarno Alanko).

## 13. Fragment assembly

### 13.5 Literature

• Page 299, last line: ... of the Chinese postman problem to assembly. ⟹ Add the following sentence: Limiting assembly to k-mers with high frequency in the de Bruijn graph has been used to build libraries of genomic repeats (Koch et al. 2014).

### Exercises

• Page 304, Line 4: conting ⟹ contig

## 14. Genomics

### Exercises

• Page 323, Exercise 14.3: This greedy algorithm is in essence the same as for set cover, Exercise 14.4, so the claimed bound does not hold. One could restate this exercise as "give an input where the greedy algorithm produces a suboptimal solution".

## 15. Transcriptomics

### 15.4 Transcript alignment with co-linear chaining

• Page 343, Problem 15.7, first bullet: 1 ≤ j ≤ p2 ≤ j ≤ p

## References

• Page 371: add the following: Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W. J., Mattick, J. S., & Haussler, D. (2004). 'Ultraconserved elements in the human genome', Science, 304(5675), 1321-1325.
• Page 372: add the following: Blumer, A., Blumer, J., Haussler, D., McConnell, R., & Ehrenfeucht, A. (1987), 'Complete inverted files for efficient text retrieval and analysis', Journal of the Association for Computing Machinery 34(3), 578-595.
• Page 374: add the following: Ellinghaus, D., Kurtz, S., & Willhoeft, U. (2008), 'LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons', BMC Bioinformatics 9(18).
• Page 377: add the following: Isenbarger, T.A., Carr, C.E., Stewart Johnson, S., Finney, M., Church, G.M., Gilbert, W., Zuber, M.T., & Ruvkun, G. (2008), 'The most conserved genome segments for life detection on Earth and other planets', Origins of Life and Evolution of Biospheres 38(6), 517-533.
• Page 377: add the following: Koch, P., Platzer, M., & Downie, B. R. (2014), 'RepARK - De novo creation of repeat libraries from whole-genome NGS reads', Nucleic Acids Research 42(9), e80-e80.
• Page 378: add the following: Kurtz, S., Choudhuri, J. V., Ohlebusch, E., Schleiermacher, C., Stoye, J., & Giegerich, R. (2001), 'REPuter: the manifold applications of repeat analysis on a genomic scale', Nucleic Acids Research 29(22), 4633-4642.
• Page 378: add the following: Lin, J., Adjeroh, D. & Jiang, B.H. (2012), 'Probabilistic suffix array: efficient modeling and prediction of protein families', Bioinformatics 28(10), 1314-1323.
• Page 380: add the following: Okanohara D. & Tsujii, J. (2009), Text categorization with all substring features, in Proceedings of the 2009 SIAM International Conference on Data Mining. SIAM, 838-846.
• Page 381: add the following: Ren, J., Song, K., Sun, F., Deng, M., & Reinert, G. (2013), 'Multiple alignment-free sequence comparison', Bioinformatics 29(21), 2690-2698.
• Page 381: add the following: Rho, M., Choi, J.-H., Kim, S., Lynch, M., & Tang, H. (2007), 'De novo identification of LTR retrotransposons in eukaryotic genomes', BMC Genomics 8(90).
• Page 381: add the following: Rieck, K., Laskov, P. & Sonnenburg, S. (2006), Computation of similarity measures for sequential data using generalized suffix trees, in Proceedings of the 2006 conference on Advances in Neural Information Processing Systems, Vol. 19. Cambridge: MIT Press, 1177-1184.
• Page 381: add the following: Rieck, K. & Laskov, P. (2008), 'Linear-time computation of similarity measures for sequential data', Journal of Machine Learning Research 9(2008), 23-48.
• Page 382: add the following: Schulz, M.H., Weese, D., Rausch, T., Döring, A., Reinert, K. & Vingron, M. (2008), 'Fast and adaptive variable order Markov chain construction', in Algorithms in Bioinformatics, Vol. 5251 of Lecture Notes in Computer Science. Berlin: Springer, 306-317.
• Page 383: add the following: Tempel, S., Rousseau, C., Tahi, F., & Nicolas, J. (2010), 'ModuleOrganizer: detecting modules in families of transposable elements', BMC Bioinformatics 11(474).
• Page 384: add the following: Volfovsky, N., Haas, B.J. & Salzberg, S.L. (2001), 'A clustering method for repeat analysis in DNA sequences', Genome Biology 2(8).