The number of clean reads was 21,157,543 for the RNA sample and 26,789,502 for the DNA sample. For RNA, the data were assembled to a total sequence length of 2,337,534, with 60.92% GC content. The length of the largest contig was 11,556 nt, which was identified as APPV (Table 1), and named as APPV-SDHY-2022 for further analysis in this study. For DNA, the data were assembled with a total sequence length of 38,447,346 and 41.71% GC content. Other viruses, including Getah virus, porcine picobirnavirus, porcine kobuvirus, porcine sapovirus, Po-Circo-like virus, porcine serum-associated circular virus, porcine bocavirus 1, porcine parvovirus 1, porcine parvovirus 5 and porcine circovirus 3 were also identified by sequence alignment ((Table 1), however, most contigs of these viruses were less than 500 bp (see Additional file 2: Table s2 & Table s3). No other known pathogens (PRRSV, PPV2-4/6–8, CSFV, PCV2 and Japanese encephalitis virus) related to abortion were sequenced.
Table 1 Blast results of assembled sequences from viral metagenomic sequencing*APPV confirmation by NS3 gene RT–PCR and sequencingAPPV presence was confirmed in the pooled sample by RT–PCR amplification targeting the NS3 gene (see Additional file 3: Fig.s1A). The assembled sequence of the PCR products was identical to that of APPV-SDHY-2022 (see Additional file 3: Fig.s1B). This provided additional evidence of APPV presence in the abortion cases.
Genome sequence and homology analysis of APPVThe genome of strain APPV-SDHY-2022 (GenBank accession no. OP381297) contains 11,556 nucleotides (nt) and consists of a 5’UTR (370 nt, positions 1 to 370), CDS (10,909 nt, 371 to 11,279), and 3’UTR (277 nt, 11,280 to 11,556). The nucleotide and amino acid sequences of the individual proteins of the strains were aligned separately, and the homology between APPV-SDHY-2022 and the reference strains was determined (Table 2). Sequence alignment based on APPV polyprotein CDS showed that the nucleotide identities of APPV-SDHY-2022 with Clade I, Clade II, and Clade III strains were 82.6-84.2%, 93.2-93.6%, and 80.7-85%, respectively, while the amino acid identities were 91.4-92.4%, 96.4-97.7%, and 90.6-92.2%, respectively. APPV-SDHY-2022 shared the highest nucleotide identity (93.6%) with APPV-China/GD-SHM/2016, and the highest amino acid identity (97.7%) with GD-YJHSEY2N. Among the 12 mature proteins, NS5A showed the lowest homology (77.6-93.3% at the nt level) with the reference strains.
Table 2 Homology analysis of APPV-SDHY-2022 with Chinese reference strains (%)Phylogenetic analysisPhylogenetic analysis was performed based on complete polyprotein CDS and NS5A nucleotide sequences. The results showed that APPV-SDHY-2022 belongs to a separate branch of Clade II (Fig. 2A). Moreover, the results revealed that the homology of NS5A nucleotide sequences was above 94.6% for the same isoform, 84.7-94.5% for different isoforms of the same clade and 76.8-81.1% for different clades (Table 3). Therefore, we proposed that Clade II strains can be further divided into three subclades and that APPV-SDHY-2022 belongs to subclade 2.3. APPV-China/GD-SD/2016 and APPV-China/GZ01/2016 belong to subclade 2.2, and the other Chinese strains among the Clade II cluster belong to subclade 2.1 (Fig. 2B). Since Clade II strains were found only in China, this typing method can help us better analyze the evolution of Clade II strains.
Fig. 2Phylogenetic analysis of Chinese APPV strains. Phylogenetic trees based on the nucleotide sequences of the complete polyprotein CDS (A) and the NS5A gene (B) were constructed by the neighbor-joining (NJ) method with 1,000 bootstrap replicates in MEGA11 software. The APPV-SDHY-2022 strain reported in this study is indicated with a red dot
Table 3 Homology analysis of NS5A nucleotide sequence within clades or subclades (%)Recombination analysisTo further explore the genetic evolution of APPV, potential recombination events were identified using Recombination Detection Program version 4 (RDP4) and then examined using SimPlot version 3.5.1. Among all available APPV strains, 8 strains (GD-DH01-2018, GD-BZ01-2018, JX-JM01-2018A01, GD2, GD-HJ-2017.04, GD-LN-2017.04, GD-CT4, and GD-MH01-2018) had potential genetic recombination events. Although NGS of APPV-SDHY-2022 confirmed recombination events of JX-JM01-2018A01 and GD-HJ-2017.04 by RDP4 (see Additional file 4: Table s4), no obvious genetic recombination in APPV-SDHY-2022 strains was observed by SimPlot software in this study (Fig. 3).
Fig. 3Recombination analysis of the complete genomes of the APPV-SDHY-2022 strain from Shandong Province. Potential recombination events were identified using Recombination Detection Program 4 (RDP4) and then examined using similarity plots and bootstrap analysis in Simplot 3.5.1. The major and minor parents were JX-JM01-2018A01 and GD-HJ-2017.04, respectively
Amino acid sequence analysisAmino acid sequences of individual viral proteins of all the Chinese APPV strains were analyzed. No amino acid insertions or deletions were found in the APPV-SDHY-2022 strain. The amino acid sequences of the individual proteins were compared to identify those that differentiate Clade II from Clade I and Clade III, and 20 unique amino acids were found in Clade II strains (Fig. 4), among which, most sites were distributed on NS5A(7H,16A,69Q,131Q,152M,189I,280A,397F,437A) and NS5B(77V,139P,193P,231K,274A), and the remaining sites were on Npro (85D,120E), C(90K), Erns(91K,139Y) and NS3(30T). Interestingly, the amino acids at these unique sites were identical between Clade I and Clade III strains, demonstrating that it is possible to determine the type of strain by measuring these specific amino acids alone.
Fig. 4The unique amino acids found in Clade II APPV strains. Amino acid sequences of viral proteins were aligned with reference strains using MEGA11 and BioEdit software
Glycosylation analysisIn this study, putative N-glycosylation sites in the three important glycoproteins, Erns, E1, and E2, in Chinese APPV strains were also predicted. APPV-SDHY-2022, along with most of the strains in Clade II, is heavily glycosylated, with a total of ten N-glycosylation sites (N104 in the E1 protein; N12, N26, N43, N64, and N99 in the Erns protein; N51,N64,N103, and N127 in the E2 protein) (Fig. 5). All the Chinese APPV strains had a conserved putative N-glycosylation site at N104 with a consensus N-I-T motif in the E1 protein. The putative N-glycosylation sites in the Erns and E2 proteins differed greatly among strains in different subclades, and 9 patterns of putative N-glycosylation sites were observed in E2 proteins, including N51 + N64 + N103, N64 + N103, N51 + N64 + N103 + N141, N51 + N64 + N127 + N103 + N141, N51 + N64 + N103 + N127, N64 + N103 + N127, N51 + N127, N51 + N64, N64 (Fig. 5). Among the N-glycosylation sites of E2 proteins, a putative site at N64 was highly conserved.
Fig. 5Putative N-glycosylation sites of Erns, E1 and E2 proteins. The putative N-glycosylation sites within the Erns, E1 and E2 sequences of Chinese APPV strains were predicted according to a glycosylation analysis algorithm, and are shown as a blue shaded box
Antigen predictionTo analyze the effect of glycosylation sites on the antigenicity of the E2 protein, the antigenic index was determined by the Jameson-Wolf method in this study, and the results showed that aa positions at 1 ~ 9, 15 ~ 28, 34 ~ 44, 49 ~ 55, 62 ~ 82, 118 ~ 130, 136 ~ 158, 174 ~ 184, 188 ~ 196 and 200 ~ 205 of the E2 protein were the potential immunodominant regions. A comparison of the antigenic index within Chinese strains with and without a specific putative site showed that the putative N-glycosylation site at N51 had a negative effect on the antigenicity of the corresponding region (Fig. 6).
Fig. 6Antigenicity prediction for the E2 protein. The Jameson-Wolf algorithm, which combines secondary structure information with backbone flexibility to predict surface accessibility, was used to determine the predicted antigenic index, with a threshold value of 1.7. The putative N-glycosylation sites within the E2 sequences of Chinese APPV strains are shown as a blue arrow. Representative strains from different Clades/subclades or patterns of putative N-glycosylation sites were included, and the strains in each subclade with different patterns of putative N-glycosylation sites are underlined
To further analyze the effect of glycosylation sites on conformational epitopes of the E2 protein, BepiPred-3.0 was used to predict B-cell conformational epitopes. The results showed that the 15 most likely B-cell conformational epitope residues varied among different Clades/subclades or patterns of N-glycosylation sites, and 39E, 70R, 173R, 190K, and 191N were conserved residues among all Chinese strains (Table 4) (see also the graphical representations of the predicted epitopes in Fig. 7).
Table 4 Prediction of potential B-cell conformational epitopes from E2 protein sequenceFig. 7Conformational B-cell epitope prediction for the E2 protein. The potential B-cell conformational epitopes of the E2 protein in APPV Chinese strains were predicted by BepiPred-3.0, and the residues with scores above the threshold (default value is 0.1512) are predicted to be part of an epitope and colored in yellow on the graph (where Y-axes depict BepiPred-3.0 epitope scores and X-axes protein sequence positions). Shown is the graphical output of B-cell discontinuous epitope predictions for the E2 protein with APPV-SDHY-2022 as an example
Comments (0)