Rxivist logo

Molecular evolution of SARS-CoV-2 structural genes: evidence of positive selection in spike glycoprotein

By Xiao-Yong Zhan, Ying Zhang, Xuefu Zhou, Ke Huang, Yichao Qian, Yang Leng, Leping Yan, Bihui Huang, Yulong He

Posted 25 Jun 2020
bioRxiv DOI: 10.1101/2020.06.25.170688

SARS-CoV-2 caused a global pandemic in early 2020 and has resulted in more than 8,000,000 infections as well as 430,000 deaths in the world so far. Four structural proteins, envelope (E), membrane (M), nucleocapsid (N) and spike (S) glycoprotein, play a key role in controlling the entry into human cells and virion assembly of SARS-CoV-2. However, how these genes evolve during its human to human transmission is largely unknown. In this study, we screened and analyzed roughly 3090 SARS-CoV-2 isolates from GenBank database. The distribution of the four gene alleles is determined:16 for E, 40 for M, 131 for N and 173 for S genes. Phylogenetic analysis shows that global SARS-CoV-2 isolates can be clustered into three to four major clades based on the protein sequences of these genes. Intragenic recombination event isn’t detected among different alleles. However, purifying selection has conducted on the evolution of these genes. By analyzing full genomic sequences of these alleles using codon-substitution models (M8, M3 and M2a) and likelihood ratio tests (LRTs) of codeML package, it reveals that codon 614 of S glycoprotein has subjected to strong positive selection pressure and a persistent D614G mutation is identified. The definitive positive selection of D614G mutation is further confirmed by internal fixed effects likelihood (IFEL) and Evolutionary Fingerprinting methods implemented in Hyphy package. In addition, another potential positive selection site at codon 5 in the signal sequence of the S protein is also identified. The allele containing D614G mutation has undergone significant expansion during SARS-CoV-2 global pandemic, implying a better adaptability of isolates with the mutation. However, L5F allele expansion is relatively restricted. The D614G mutation is located at the subdomain 2 (SD2) of C-terminal portion (CTP) of the S1 subunit. Protein structural modeling shows that the D614G mutation may cause the disruption of salt bridge among S protein monomers increase their flexibility, and in turn promote receptor binding domain (RBD) opening, virus attachment and entry into host cells. Located at the signal sequence of S protein as it is, L5F mutation may facilitate the protein folding, assembly, and secretion of the virus. This is the first evidence of positive Darwinian selection in the spike gene of SARS-CoV-2, which contributes to a better understanding of the adaptive mechanism of this virus and help to provide insights for developing novel therapeutic approaches as well as effective vaccines by targeting on mutation sites. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 463 times
  • Download rankings, all-time:
    • Site-wide: 47,276 out of 118,150
    • In evolutionary biology: 2,741 out of 6,187
  • Year to date:
    • Site-wide: 15,356 out of 118,150
  • Since beginning of last month:
    • Site-wide: 35,576 out of 118,150

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)