Locality-preserving representations of k‑mer sets
14
Sketching super‑k‑mers
PhD Thesis
Abstract
Introduction
1
Comparing genomic sequences
2
Comparing using k‑mers
3
Sketching sequences
4
Sampling with minimizers
High-performance sequence processing
5
A primer on vectorization
6
Vectorized sequence parsing
7
Rolling hashes on sequences
8
Vectorized computation of minimizers
9
Application to sequence filtering
Discussion
Locality-preserving representations of k‑mer sets
10
Background on k‑mer sets
11
Necklaces and minimizers
12
Set representation and operations
13
Super‑k‑mers maps
14
Sketching super‑k‑mers
Discussion
Sampling k‑mers to lower memory & complexity
15
Background on low density minimizers
16
Multiminimizers
17
Locally-consistent phrases
18
Lexicographic-informed sampling
Discussion
Discussion & conclusion
References
Appendices
A
Vectorized sequence parsing
B
A forward scheme for canonical minimizers
Locality-preserving representations of k‑mer sets
14
Sketching super‑k‑mers
14
Sketching super‑k‑mers
Note
This chapter is adapted from
(
Rouzé
et al.
, 2023
)
and
(
Rouzé
et al.
, 2025
)
.
Rouzé
, T.,
Martayan
, I.,
Marchet
, C., &
Limasset
, A. (2023)
Fractional Hitting Sets for Efficient and Lightweight Genomic Data Sketching
.
23rd international workshop on algorithms in bioinformatics (WABI 2023)
, vol. 273. Schloss Dagstuhl – Leibniz-Zentrum f
ü
r Informatik.
Rouzé
, T.,
Martayan
, I.,
Marchet
, C., &
Limasset
, A. (2025)
Fractional hitting sets for efficient multiset sketching
.
Algorithms for Molecular Biology
,
20
, 1.
13
Super‑k‑mers maps
Discussion