Skip to content
Snippets Groups Projects
federicaluppino's avatar
b7e9e503

DeMAG

We present DeMAG (Deciphering Mutations in Actionable Genes) a supervised and specialized VEP (Variant Effect Predictor) to interpret any missense mutations in 59 actionable genes (ACMG SF v2.0 genes[1]).

This repository will contain the main scripts (all in R language) and data resources:

  • scripts, the directory will contain R scripts to reproduce the analysis (e.g., feature selection, model training and testing),
  • data, the directory will contain the training set and the validation sets used to benchmark DeMAG and the new features of the model ready to use.

DeMAG web server allows scientists, health professionals or anyone curious to:

  • access and download predictions for all ~1.3 million missense mutations for the ACMG SF v2.0 genes (ACMG59 genes),
  • investigate the features of the model for any mutation to understand DeMAG's pathogenicity score,
  • download the high-quality training set to fully reproduce our results,
  • download the validation sets we used to benchmark DeMAG against popular VEPs (REVEL, EVE, ...).

DeMAG is a joint collaboration between the Max Planck Insitute of Molecular Cell Biology and Genetics (MPI-CBG) and Harvard Medical School (HMS). You can read our paper on bioRxiv:

DeMAG predicts the effect of variants in clinically actionable genes by integrating structural and evolutionary epistatic features

Federica Luppino1,2, Ivan A. Adzhubei4,5, Christopher A. Cassa4, Agnes Toth-Petroczy*1,2,3

1. Max Planck Institute of Molecular Cell Biology and Genetics, Dresden 01307, Germany. 2. Center for Systems Biology, Dresden 01307, Germany. 3. Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany. 4. Brigham and Womenʼs Hospital Division of Genetics, Harvard Medical School, Boston, MA, 02115 USA. 5. Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115.

References

[1] Kalia, S. S. et al. (2017). Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255

Training data

The training data will be available upon publication and upon request as it contains data from HGMD professional version. For all the other sources please refer to the methods section of the manuscript and to this summary table:

Training data Source
ClinVar https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/archive_2.0/2021/clinvar_20210501.vcf.gz
HumVar, HumOrtho PolyPhen-2 training set
HGMD professional version 2020.03 http://www.hgmd.cf.ac.uk/ac/index.php
gnomAD https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.vcf.bgz
primateAI https://basespace.illumina.com/s/yYGFdGih1rXL
KRGDB http://152.99.75.168:9090/KRGDB/
3.5KJPNv2 https://humandbs.biosciencedbc.jp/files/hum0015/tommo-3.5kjpnv2-20181105-af_snvall-autosome.zip

Testing data

The testing data is available in the directory testing_sets and description of the variants analysed is in the methods section of the manuscript. The directory contains the clinical and functional validation set. We did not upload the putatively benign variants testing set from the Estonian Biobank (population validation set) as it is not yet public data.


This work is licensed under a Creative Common Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0.