Search in the genome collection
This page allows you to perform a very fast kmer search against all the genomes in the OMD with MetaGraph. If you want to know more about this approach, here is what the authors have to say about it:
The graph-based full-text search of all assembled genomes and retrieval of annotated sequence elements was implemented using the MetaGraph framework [1]. Briefly, this framework combines a succinct de Bruijn graphs [2] as a k-mer index with a compressed binary relation matrix [3,4] to jointly represent all sequences in a searchable, colored graph. This setup allows for very fast search of any given sequence of length >= 31 against the genome database. Overlaps to individual genes (and consequently BGCs) are resolved via annotated genome coordinates [5].
1. Karasikov M, Mustafa H, Danciu D, Zimmermann M, Barber C, Rätsch G, et al. Metagraph: Indexing and analysing nucleotide archives at petabase-scale. bioRxiv. 2020. doi:10.1101/2020.10.01.322164
2. Bowe A, Onodera T, Sadakane K, Shibuya T. Succinct de Bruijn graphs. Algorithms in Bioinformatics. Springer; 2012. pp. 225–235.
3. Karasikov M, Mustafa H, Joudaki A, Javadzadeh-No S, Rätsch G, Kahles A. Sparse Binary Relation Representations for Genome Graph Annotation. J Comput Biol. 2020;27: 626–639.
4. Danciu D, Karasikov M, Mustafa H, Kahles A, Rätsch G. Topology-based Sparsification of Graph Annotations. Cold Spring Harbor Laboratory. 2021. p. 2020.11.17.386649. doi:10.1101/2020.11.17.386649
5. Karasikov M, Mustafa H, Ratsch G, Kahles A. Lossless Indexing with Counting de Bruijn Graphs. bioRxiv. 2021. doi:10.1101/2021.11.09.467907
1. Karasikov M, Mustafa H, Danciu D, Zimmermann M, Barber C, Rätsch G, et al. Metagraph: Indexing and analysing nucleotide archives at petabase-scale. bioRxiv. 2020. doi:10.1101/2020.10.01.322164
2. Bowe A, Onodera T, Sadakane K, Shibuya T. Succinct de Bruijn graphs. Algorithms in Bioinformatics. Springer; 2012. pp. 225–235.
3. Karasikov M, Mustafa H, Joudaki A, Javadzadeh-No S, Rätsch G, Kahles A. Sparse Binary Relation Representations for Genome Graph Annotation. J Comput Biol. 2020;27: 626–639.
4. Danciu D, Karasikov M, Mustafa H, Kahles A, Rätsch G. Topology-based Sparsification of Graph Annotations. Cold Spring Harbor Laboratory. 2021. p. 2020.11.17.386649. doi:10.1101/2020.11.17.386649
5. Karasikov M, Mustafa H, Ratsch G, Kahles A. Lossless Indexing with Counting de Bruijn Graphs. bioRxiv. 2021. doi:10.1101/2021.11.09.467907
This currently only supports DNA search but we're looking into both expanding that and adding a blast option so stay tuned!