Skip to content
Snippets Groups Projects

Release the beast

Merged schereme requested to merge release_the_beast into master
3 files
+ 28
40
Compare changes
  • Side-by-side
  • Inline
Files
3
+ 13
22
cff-version: 1.2.0
message: If you use this software, please cite it using these metadata.
title: SHARK
abstract: Similarity/Homology Assessment by Relating K-mers
title: SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences
abstract: Intrinsically disordered regions (IDRs) are structurally flexible protein segments with regulatory functions in multiple contexts, such as in the assembly of biomolecular condensates. Since IDRs undergo more rapid evolution than ordered regions, identifying homology of such poorly conserved regions remains challenging for state-of-the-art alignment-based methods that rely on position-specific conservation of residues. Thus, systematic functional annotation and evolutionary analysis of IDRs have been limited, despite them comprising ~21% of proteins. To accurately assess homology between unalignable sequences, we developed an alignment-free sequence comparison algorithm, SHARK (Similarity/Homology Assessment by Relating K-mers). We trained SHARK-dive, a machine learning homology classifier, which achieved superior performance to standard alignment-based approaches in assessing evolutionary homology in unalignable sequences. Furthermore, it correctly identified dissimilar but functionally analogous IDRs in IDR-replacement experiments reported in the literature, whereas alignment-based tools were incapable of detecting such functional relationships. SHARK-dive not only predicts functionally similar IDRs at a proteome-wide scale but also identifies cryptic sequence properties and motifs that drive remote homology and analogy, thereby providing interpretable and experimentally verifiable hypotheses of the sequence determinants that underlie such relationships. SHARK-dive acts as an alternative to alignment to facilitate systematic analysis and functional annotation of the unalignable protein universe.
authors:
- family-names: Chow
given-names: Chi Fung Willis
orcid: "https://orcid.org/0000-0001-9889-9664"
- family-names: Lenz
given-names: Swantje
orcid: "https://orcid.org/0000-0002-8839-5371"
- family-names: Scheremetjew
given-names: Maxim
orcid: "https://orcid.org/0000-0002-7458-3072"
- family-names: Ghosh
given-names: Soumyadeep
orcid: "https://orcid.org/0000-0002-4691-3636"
- family-names: Richter
given-names: Doris
- family-names: Alberti
given-names: Simon
orcid: "https://orcid.org/0000-0003-4017-6505"
- family-names: Hadarovich
given-names: Anna
orcid: "https://orcid.org/0000-0002-5139-4308"
- family-names: Toth-Petroczy
given-names: Agnes
orcid: "https://orcid.org/0000-0002-0333-604X"
version: 1.2.1
date-released: "2024-09-02"
date-released: "2024-10-09"
identifiers:
- description: This is the archived snapshot of version 1.2.1 of SHARK-Dive
type: url
value: "https://git.mpi-cbg.de/tothpetroczylab/shark/-/tags/1.2.1rc1"
- description: This is the archived snapshot of version 2.0.0 of SHARK-capture
type: doi
value: 10.5281/zenodo.11085684
value: "https://git.mpi-cbg.de/tothpetroczylab/shark/-/tags/1.2.1"
- type: doi
value: "10.1073/pnas.2401622121"
keywords:
- motif detection
- IDRs
- intrinsically disordered protein regions
- homology detection
- sequence-to-function
- machine learning
- alignment-free
- research
license: BSD-3-Clause
license-url: "https://git.mpi-cbg.de/tothpetroczylab/shark/-/raw/69fc869fd2e96add4066748e8b8b9ee3bd9839b8/LICENSE"
license-url: "https://git.mpi-cbg.de/tothpetroczylab/shark/-/blob/e7cfd9d0a9528ccb31390790c9e8b561ab96a049/LICENSE"
repository-code: "https://git.mpi-cbg.de/tothpetroczylab/shark"
\ No newline at end of file
Loading