Comparison of Whole-Genome Sequences of COVID-19 Strains in Wuhan-China, USA and Spain to Determine Source of Epidemic

Background: The World Health Organization declared the SARS-CoV-2 outbreak a global public health emergency from Wuhan/China to others countries. Methodology: Genetic analyses of sixty eight complete genomes of SARS-CoV-2 (38 reference strain from China, 24 strain from USA and 7 strains from Spain and others from Japan and Korea) were performed. By the Bio Edit software a multiple alignment of the COVID-19 sequences was performed for each of America (24 strains) and China (38 strains) separately, also, multiple alignments were made with 68 sequences of strains of the virus from each of America, China and Spain. The mutation in complete genome of virus was detected. Phylogeny tree representative strains from China, USA, Spain and Japan with two MERS strains one from KSA and one from Korea as comparable. Results: The result shown that 31.6% had point mutations in Chinese strains, 73.9% in USA and 71.4 mutant strains in Spain. Most of the mutation occurred ORF1ab, OFR7a, S gene and ORF6a respectively in China, USA and Spain strains. Conclusion: The conclusion shown that there is a high genetic identity between the selected strains of virus in China-Wuhan and those that spread in America and Spain, which indicates that the epidemic center is coming from China. These observations provided evidence of the genetic diversity and rapid evolution of this novel coronavirus.


Introduction
In December 2019, a new coronavirus strain was discovered in Wuhan-China, which is officially named COVID-19. Within two months of the discovery of the first patient, it has now out brake across China and in all areas globally [1]. More than two millions people were infection till time, the highest infection in USA, Europe and Asia especially in China [2] [3].
Due to unlock down of the city of Wuhan as the epidemic center, and many people (Students, workers, officials and visitors) traveled for business or spend a nice time in new year without caution and inadvertency from all countries in America, Europe and Asia, many events contributed speeded the outbreak of SARS Epidemic. Some of these factors not activation the quarantine clauses, the biology of virus was unknown causes high contamination with virus [4] [5].
Regularly updated information on severe acute respiratory syndrome coronavirus COVID-19 outbreak is available on ECDC's website [6], the European Commission website [7] and the World Health Organization's website [3].
Nucleotide substitution has been proposed to be one of the most significant components of viral advancement in nature [8]. The fast spread of SARS-CoV-2 brings up captivating issues, for example, regardless of whether its development is driven by changes. Since December 2019, the SARS-CoV-2 has infected more than 2 million people and caused deaths among them (with high case-fatality rate) and has spread to most world countries according to WHO reports 90 -91 Therefore, it is critical to track and characterize the COVID-19 genomic variants in different geographic locations of epidemic. In this study, aim to determination the genetic variation in coding and non-coding regions of viral genome and made comparison between China, USA and Spanish strains to give explanation of sources of epidemic.

Analysis Data Methods
In this study, the sequencing data of 68 SARS-CoV-2 strains were retrieved for from genbank. A length of 3000 base of 38 SARS-CoV-2 strains from China were read separately from 24 SARS-CoV-2 virus from USA and 7 SARS-CoV-2 strains from Spain. Multiple alignment was performed for SARS-CoV-2 strains group based on Bio Edit software.
The multiple alignment nucleotide gathering course of action was performed by Bio Edit software and the succession of the strain China/Wuhan. Mu-1/2020/ was utilized as leading sequence alignment for China strains. while, the MT258379.1 USA/CZB-RR057-007/2020 was utilized for alignment USA strains.

Mutation Detection
We aligned the clean data to SARS-CoV-2 complete genomes (MN908947.3 as RefSeq as Leading sequence). Based on compartment triple codon of ORF1ab, S, ORF6a and other genes, the mutation types were determined the Missense mutation, Silent mutations and Insertion/Deletion.

Phylogenetic Analysis
Phylogenetic tree construction by the UPGMA method was performed using MEGA X software [9].

Epidemic Outbreak
The recent epidemic map of SARS COV2 through the world was shown that the COVID-19 outbreak globally, the SARS-CoV-2 has infected more than 5,000,000 people and caused more than 300,000 deaths, this epidemic was spread to Western Pacific Region 19 countries; European Region 60 countries; South-East Asia spread of the virus in Wuhan for many days, and this silence was associated with the Lunar New Year, continued flow of workers, students and citizens from China to various countries of the world, especially America, Italy and Spain. The low beware response in the initial stages of the pandemic is the same aspect in the world. The response from some USA, Spain and others worlds' cities was grim, while others appeared to be holding steady in these early days of the outbreak. Without taking these precautionary measures. For example, Till March 11, 2020, New York City's subway system, the nation's largest, was down 18.65 percent, compared to the same day last year, whereas China announced a large outbreak of the epidemic in January and February and declared Quarantine in

Molecular Characteristics of Coronaviruses
COVID-19 are enveloped with a non-segmented, positive-sense, single-strand RNA, with size about 3000 bases, and consider as largest known genome among RNA viruses [10].  The genomic structure of COVID-19 is compose of ORF1ab and S (Spike), E (Envelope), M (Membrane), N (Nucleocapsid). There are accessory genes interspersed within the structural genes at the 3' end of genome [11]. Some of point mutation some of which have been shown to may be an important roles in viral pathogenesis [12]. The S protein is responsible for receptor-binding and subsequent viral entry into host cells, the M and E proteins play important roles in viral assembly, and the N protein is necessary for RNA synthesis [13].

Distribution of Variants in SARS-CoV-2 Genome
After aligning the sequencing data to the SARS-CoV-2 genome, we detected the

Determined the Mutations Type Along COVID-19 Genome
Most mutation types which observed in this genetic analysis study were transition mutation (pyrimidine to Pyrimidine (C > T), transverion mutation (Purine

Phylogeny Tree
The UPGMA phylogenetic tree ( Figure 6    strains of the COVID-19 in the evolutionary tree, which indicates, even without doubt, that all strains of the COVID-19 belong to one evolutionary origin [20], and the source of the virus and its spread based on tracking epidemic genetically of the virus strains, this results consistent with Rambaut [21]. The phylogeny tree ( Figure 6) of the COVID-19 was confirm that the source of epidemic is the Chinese city of Wuhan, and that the differences that were established within the multiple alignment of different sequences from their spread sites in China, America, Japan and Spain may return to a level of readiness to occur counting mutations among the strains of interest.

Conclusion
The conclusion shown that there is a high genetic identity between the selected strains of virus in China-Wuhan and those that spread in America and Spain, which indicates that the epidemic center is coming from China. These observations provided evidence of the genetic diversity and rapid evolution of this novel coronavirus.