Selected Topics in Computational Complexity I
Michal Koucký
<koucky@iuuk.mff.cuni.cz>
ZS 2024/2025
NTIN085 - 2/1 Z, Zk
Time: Thursday 14:00-15:30
Place: 3rd floor corridor, #321 Malá Strana.
This course covers advanced topics in computational complexity. This semester it will focus on edit distance.
Edit distance is a measure of string similarity. It has various applications in bioinformatics, text processing, etc.
In this course we will focus on various aspects of edit distance: from algorithms and lower bounds to sketching and error-correction. This will showcase various algorithmic and complexity issues.
The course is primarily intended for senior students and phd students. The course assumes basic knowledge of computational complexity, probability, and discrete mathematics. The course can be taken repeatedly.
Tentative plan of lectures
- Dynamic programming, Landau-Vishkin algorithm. (Notes) [1-3]
- Fine-grained lower bounds (Backurs-Indyk, Lifshits). (Notes) [4]
- Approximation algorithms. (Notes) [5-6]
- Sublinear algorithms. (Notes) [7-10]
- Sketching edit distance, embedding of edit distance into Hamming distance and vice-versa. (Notes) [8-9]
- Error-correcting codes for edit distance. (Notes) [11-12]
- Compressed strings, LZ, measures of string complexity. (Notes on compression measures [13-16], notes on recompression and edit distance computation [17])
- Planar reachability algorithms (Klein, Das-Kipouridis-Gutenberg-Wulff-Nilsen).
- Seaweed technique, Monge matrices (Tiskin, Russo, Gawrychowski-Gorbachev-Kociumaka).
- Pattern-matching (Porat-Porat).
Literature:
-
Ukkonen: Algorithms for approximate string matching. Information Control 1985.
-
Landau and Vishkin: Fast Parallel and Serial Approximate String Matching. J. Algorithms 1989.
-
Landau, Myers, and Schmidt. Incremental string comparison. SIAM J. Computing 1998.
-
Backurs and Indyk: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). STOC 2015.
-
Chakraborty, Das, Goldenberg, Koucký, and Saks: Approximating edit distance within constant factor in truly sub-quadratic time. FOCS 2018.
-
Andoni and Nosatzki: Edit distance in near-linear time: it's a constant factor. FOCS 2020.
-
Barna Saha: The dyck language edit distance problem in near-linear time. FOCS 2014.
-
Chakraborty, Goldenberg, and Koucký: Streaming algorithms for embedding and computing edit distance in the low distance regime. STOC 2016.
-
Kociumaka, and Saha: Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance. FOCS 2020.
-
Koucký and Saks: Simple, deterministic, fast (but weak) approximations to edit distance and Dyck edit distance. SODA 2023.
-
Haeupler and Shahrasbi: Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound. STOC 2017.
-
Haeupler and Shahrasbi: Synchronization Strings and Codes for Insertions and Deletions - a Survey. IEEE Transactions on Information Theory, 2021.
-
Kociumaka, Navarro, and Prezza: Toward a Definitive Compressibility Measure for Repetitive Sequences. IEEE Trans. Inf. Theory, 2023.
-
Rytter: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science, 2003.
-
Raskhodnikova, Ron, Rubinfeld, Smith: Sublinear Algorithms for Approximating String Compressibility. Algorithmica, 2013.
-
Jez: A really simple approximation of smallest grammar. Theor. Comput. Sci., 2016.
-
Tomohiro I: Longest Common Extensions with Recompression, CPM 2017.
Homeworks
Read the NP-hardness result in "Yury Lifshits: Processing Compressed Texts: A Tractability Border. CPM 2007" and prepare notes for it.
Alternatively, read the paper "Benny Porat and Ely Porat: Exact and Approximate Pattern Matching in the Streaming Model. FOCS 2009" and prepare notes for that one.