In order to show that DeMAG is generalizable we run the model on a set of extra 334 genes that have at least 5 benign and 5 pathogenic high-quality (at least 2 review stars) variants in ClinVar (version 20220812).
For the features we used:
- precomputed EVmutation scores from the [webserver](<https://marks.hms.harvard.edu/evmutation/>)
- [IUPred2A](<https://iupred2a.elte.hu/>) disorder scores that we obtained with the command line application. To note here that for the UniProt id Q8NFD5, we specified Q8NFD5-1 because IUPred2A considered as canonical isoform the id Q8NFD5-5 (as correctly indicated in UniProt).
- pLDDT scores obtained from the AlphaFold pdb files. For protein longer than 1800 residues we assembled AlphaFold multiple models to produce an ensemble structure. Nevertheless, these models are highly unreliable. (AlphaFold model is missing for UniProt id Q9NZV5).
- Partners score that we designed based on 3D contacts between C-alpha atoms but without evolutionary coupled residues because it would have required to run an alignment pipeline not optimized for hundreds of genes.
- [PolyPhen-2](<http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads>) was used to annotate the remaining 9 features (Nsubs, Score2, Nres, Score1, phylop, dScore, NormASA, BaRE, DistQmin). Check [here](<http://genetics.bwh.harvard.edu/pph2/dokuwiki/appendix_a>) for a description of the features.