We maintain here the list of all typos and mistakes found after the book was published.

If you find a new mistake, please let us know via email and we will publish it here.

## 3. Data Structures

__Page 23, Example 3.1__: Table second, value 11 should be 2 and value 13 should be 4. (observed at a study group of String Processing Algorithms course, Autumn 2018)__Page 23, Example 3.1__: Table third, row 100 should contain 111 rather than 100. (reported by Anna Kuosmanen)- See also illustration of the above corrections by Baraa Orabi and Faraz Hach.
__Page 23, line -5, definition of select__: it should say maximal i such that rank(B,i)=j AND B[i]=1.__Page 24, down__poistions__Page 26, lines 10-15__: Indexes*i*and_{v}*i*become wrongly updated, as they shift to the left (outside the query interval), although they should shift to the right so that they remain inside the query interval. A fix is to use_{w}*rank*and_{*}(B_{v}, i_{v}-1)+1*rank*to update the coordinates when going down._{*}(B_{w}, i_{w}-1)+1- To help infer the idea of the correct algorithm, here is a simulation on word pääsiäinen.

## 4. Graphs

### 4.2.1 Eulerian paths

__Page 34, Corollary 4.4__: the third bullet point |N-(v)| = |N-(v)| should be |N-(v)| = |N+(v)| (reported by Masahiro Kanai)

### Exercises

__Page 39, Exercise 4.16__:*c : V(G) → Q*⟹*c : E(G) → Q*__Page 39, Exercise 4.16__: that passes through at most*t*sets of the partition*S*⟹ that changes the partite sets of*S*at most*t*times__Page 39, Exercise 4.17__:*c : V(G) → Q*⟹*c : E(G) → Q*

## 5. Network flows

### 5.4 Covering problems

__Page 61, Figure 5.9(b)__: the edge from*f*to_{out}*g*should have cost €3, and the one from_{in}*g*to_{out}*h*should have cost €4 (see Figure 5.9(a))_{in}

## 6. Alignments

### 6.1.2 Shortest detour

__Page 76, line 24__: in*O(((n-m)+2x)-m) = O((t/delta)m)*,*O(((n-m)+2x)-m)*should be*O(((n-m)+2x)m)*(reported by Masahiro Kanai).__Page 95, line -7,__:*s_{n+1}*should be*b_{n+1}*(reported by Massimo Equi).

## 7. Hidden Markov models (HMMs)

### Insight 7.1

- Number 23 should be 22 through the example (number of non-coding emissions).

### Exercises

__Page 123, Exercise 7.4__:*S = s*⟹_{n}...s_{n}*S = s*(reported by Rob Patro)_{1}...s_{n}

## 8. Classical Indexes

### 8.2 Suffix array

__Page 134, line -4__: "Now let [...]*W*∈ [1.. \sigma]* be a sequence of strings" ⟹ "Now let [...]^{i}*W*∈ [1.. \sigma]^{i}^{+}be a sequence of nonempty strings" (reported by Roman Cheplyaka).__Page 135, line 5__: "for a given integer*p*consider the set of distinct prefixes*S*= {^{p}*W*|^{k}_{1..p}*k*∈ [1..*n*],*p*∈ [1..*m*] }" ⟹ "for a given integer*p*∈ [1..*m*], consider the set of distinct prefixes*S*= {^{p}*W*|^{k}_{1..p}*k*∈ [1..*n*] }" (reported by Roman Cheplyaka)__Page 135, Lemma 8.7__: "A sequence of strings [...] |*W*∈ [1.. \sigma]* " ⟹ "A sequence of nonempty strings [...] |^{i}*W*∈ [1.. \sigma]^{i}^{+}" (reported by Roman Cheplyaka)__Page 138, last line__:*c*=*T*[\tilde{SA}[*k*]] ⟹*c*=*T*[\tilde{SA}[*k*]-1] (reported by Roman Cheplyaka)__Page 137, line 12__:*S*[*T*]=0 ⟹*S*[*T*]=1 (reported by Roman Cheplyaka).

### 8.3 Suffix tree

__Page 141, Figure 8.4__: in the suffix-link tree, the path from the root labelled by`GAGG`

should be relabelled`GAGA`

. The suffix-link tree should also contain an additional path from the root labelled by`GCGAGA#`

.__Page 144, line -6__: a tree on*n*leaves ⟹ a tree of size*O(n)*on*n*leaves__Page 144, proof of Lemma 8.17, serious flaw__: The statement (with above fix) is correct, but the proof has a flaw: The proof omits the case when "(" is deleted, and pointers to it become invalid. This can be solved adding a union-find data structure, but the running time is no longer linear. However, there exist also correct linear time solutions: see Gabow & Tarjan, STOC 1983, or a recent simplified and practical solution Alzamel, Charalampopoulos, Iliopoulos, and Pissis, IWOCA 2017. Proof of Theorem 8.18 uses Lemma 8.17, but there is also a direct proof using properties of the suffix tree. Also Theorem 8.21 (document counting) uses Lemma 8.17. Another way to convince on the correctness of the claim of Lemma 8.17 is to read The LCA Problem Revisited by Bender and Farach-Colton for the online version of the problem. (reported by the authors)

### 8.4 Applications of the suffix tree

__Page 147, line 10__: if*V*is the leaf ⟹ if*v*is the leaf (reported by Monika Balvočiūtė)

### 8.5 Literature

__Page 152, line -6__: ... whose presentation we partly followed to introduce the repeat finding concepts. ⟹__Add the following sentence__: Example applications of such concepts to genome analysis are*long terminal repeat retrotransposons*,*k-mismatch repeats*, and*transposable element modules*, which can be detected using maximal repeats as seeds (Ellinghaus*et al.*2008, Rho*et al.*2007, Volfovsky*et al.*2001, Kurtz*et al.*2001, Tempel*et al.*2010), and*ultraconserved elements*, which correspond to maximal unique matches or to high-scoring alignments across multiple genomes (see e.g. Bejerano*et al.*2004, Isenbarger*et al.*2008).__Page 153, line 4__: Crochemore & Vérin (1997b) ⟹ Blumer*et al.*(1987)__Page 153, line 5__: Apostolico & Lonardi (2002) and Crochemore & Vérin (1997a) ⟹ Gusfield (1997), Crochemore & Vérin (1997a), and Crochemore & Vérin (1997b)

## 9. Burrows-Wheeler indexes

### 9.7.1 Frequency-oblivious representation

__Page 192, line 4__: end in a vertex prefixed by*c*_{1}*W*⟹ end in vertex*c*_{1}*W*

### 9.7.3 Space-efficient construction

__Page 193, line -18__: Both the frequency-aware and the frequency-oblivious representations of DBG_{T#,k}for some string*T*⟹ Both the frequency-aware and the frequency-oblivious representations of the de Bruijn graph of a string*T*

### 9.8 Literature

__Page 195, line -17__: is due to Kulekci*et al.*(2012). ⟹ is described in Kulekci*et al.*(2012) and in Okanohara*et al.*(2009).

### Exercises

__Exercise 9.7__:`extendRight()`

and`enumerateRight()`

are not used by Algorithm 9.3, so there is no need to consider`enumerateRightExtended()`

(reported by Jarno Alanko).

## 11. Genome analysis and comparison

### 11.1 Space-efficient genome analysis

__Page 224, Algorithm 11.2, lines 12,13,14,18__: replace*I*with*P*(reported by Jarno Alanko).__Page 228, Algorithm 11.3, line 5__: the line could be shortened to: "if*j*then" (reported by Monika Balvočiūtė)_{1}< i_{1}

### 11.2 Comparing genomes without alignment

__Page 237, line 4__: the value of the*k*-mer kernel is ⟹ the value of the kernel is

### 11.3 Literature

__Page 255, line -8__: The space-efficient computation ... (2014). ⟹__Add the following sentence__: The computation of the same kernels on generalized suffix trees is described in (Rieck*et al.*2006, Rieck & Laskov 2008).__Page 256, line 4__: ... and a geometric treatment... is described in Benham et al. (2013). ⟹__Add the following sentence__: Generalizations to measure the similarity of all strings in a set, and applications to regulatory DNA, are described in Ren et al. (2013).__Page 256, line 15__: The connection between ... was established in Apostolico & Bejerano (2000). ⟹__Add the following sentence__: Probabilistic suffix tries were then implemented using lazy suffix trees and enhanced suffix arrays (Lin*et al.*2012; Schulz*et al.*2008).

### Exercises

__Page 257, Exercise 11.7 (b)__:*σ*⟹^{k+1}*σ*(reported by Jarno Alanko).^{k+1}- σ^{k}+ 1

## 13. Fragment assembly

### 13.5 Literature

__Page 299, last line__: ... of the Chinese postman problem to assembly. ⟹__Add the following sentence__: Limiting assembly to*k*-mers with high frequency in the de Bruijn graph has been used to build libraries of genomic repeats (Koch*et al.*2014).

### Exercises

__Page 304, Line 4:__conting ⟹ contig

## 14. Genomics

### Exercises

__Page 323, Exercise 14.3__: This greedy algorithm is in essence the same as for set cover, Exercise 14.4, so the claimed bound does not hold. One could restate this exercise as "give an input where the greedy algorithm produces a suboptimal solution".

## 15. Transcriptomics

### 15.4 Transcript alignment with co-linear chaining

__Page 343, Problem 15.7, first bullet__:*1 ≤ j ≤ p*⟹*2 ≤ j ≤ p*

## References

__Page 371__: add the following: Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W. J., Mattick, J. S., & Haussler, D. (2004). 'Ultraconserved elements in the human genome',*Science*,**304**(5675), 1321-1325.__Page 372__: add the following: Blumer, A., Blumer, J., Haussler, D., McConnell, R., & Ehrenfeucht, A. (1987), 'Complete inverted files for efficient text retrieval and analysis',*Journal of the Association for Computing Machinery***34**(3), 578-595.__Page 374__: add the following: Ellinghaus, D., Kurtz, S., & Willhoeft, U. (2008), 'LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons',*BMC Bioinformatics***9**(18).__Page 377__: add the following: Isenbarger, T.A., Carr, C.E., Stewart Johnson, S., Finney, M., Church, G.M., Gilbert, W., Zuber, M.T., & Ruvkun, G. (2008), 'The most conserved genome segments for life detection on Earth and other planets',*Origins of Life and Evolution of Biospheres***38**(6), 517-533.__Page 377__: add the following: Koch, P., Platzer, M., & Downie, B. R. (2014), 'RepARK - De novo creation of repeat libraries from whole-genome NGS reads',*Nucleic Acids Research***42**(9), e80-e80.__Page 378__: add the following: Kurtz, S., Choudhuri, J. V., Ohlebusch, E., Schleiermacher, C., Stoye, J., & Giegerich, R. (2001), 'REPuter: the manifold applications of repeat analysis on a genomic scale',*Nucleic Acids Research***29**(22), 4633-4642.__Page 378__: add the following: Lin, J., Adjeroh, D. & Jiang, B.H. (2012), 'Probabilistic suffix array: efficient modeling and prediction of protein families',*Bioinformatics***28**(10), 1314-1323.__Page 380__: add the following: Okanohara D. & Tsujii, J. (2009), Text categorization with all substring features, in*Proceedings of the 2009 SIAM International Conference on Data Mining*. SIAM, 838-846.__Page 381__: add the following: Ren, J., Song, K., Sun, F., Deng, M., & Reinert, G. (2013), 'Multiple alignment-free sequence comparison',*Bioinformatics***29**(21), 2690-2698.__Page 381__: add the following: Rho, M., Choi, J.-H., Kim, S., Lynch, M., & Tang, H. (2007), 'De novo identification of LTR retrotransposons in eukaryotic genomes',*BMC Genomics***8**(90).__Page 381__: add the following: Rieck, K., Laskov, P. & Sonnenburg, S. (2006), Computation of similarity measures for sequential data using generalized suffix trees, in*Proceedings of the 2006 conference on Advances in Neural Information Processing Systems*, Vol. 19. Cambridge: MIT Press, 1177-1184.__Page 381__: add the following: Rieck, K. & Laskov, P. (2008), 'Linear-time computation of similarity measures for sequential data',*Journal of Machine Learning Research***9**(2008), 23-48.__Page 382__: add the following: Schulz, M.H., Weese, D., Rausch, T., Döring, A., Reinert, K. & Vingron, M. (2008), 'Fast and adaptive variable order Markov chain construction', in*Algorithms in Bioinformatics*, Vol. 5251 of Lecture Notes in Computer Science. Berlin: Springer, 306-317.__Page 383__: add the following: Tempel, S., Rousseau, C., Tahi, F., & Nicolas, J. (2010), 'ModuleOrganizer: detecting modules in families of transposable elements',*BMC Bioinformatics***11**(474).__Page 384__: add the following: Volfovsky, N., Haas, B.J. & Salzberg, S.L. (2001), 'A clustering method for repeat analysis in DNA sequences',*Genome Biology***2**(8).