Rxivist logo

Differentiating risk-conferring from benign missense variants, and therefore optimal calculation of gene-variant burden, represent a major challenge in particular for rare and genetic heterogeneous disorders. While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes are paralogs and belong to gene families. It has not been thoroughly investigated how gene family information can be utilized for disease gene discovery and variant interpretation. We developed a paralog conservation score to empirically evaluate whether paralog conserved or non-conserved sites of in-human paralogs are important for protein function. Using this score, we demonstrate that disease-associated missense variants are significantly enriched at paralog conserved sites across all disease groups and disease inheritance models tested. Next, we assessed whether gene family information could assist in discovering novel disease-associated genes. We subsequently developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in more than 10k neurodevelopmental disorder patients. 33 gene family enriched genes represent novel candidate genes which are brain expressed and variant constrained in neurodevelopmental disorders.

Download data

  • Downloaded 1,635 times
  • Download rankings, all-time:
    • Site-wide: 9,447
    • In genetics: 466
  • Year to date:
    • Site-wide: 59,694
  • Since beginning of last month:
    • Site-wide: 49,736

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)