Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites

Article information

Plant Pathol J. 2018;34(2):150-156
Publication date (electronic) : 2018 April 01
doi : https://doi.org/10.5423/PPJ.NT.11.2017.0243
Department of Life Science, Chung-Ang University, Seoul 06974, Korea
*Corresponding author. Phone) +82-2-820-5812, FAX) +82-2-825-5206, E-mail) hahny@cau.ac.kr
Handling Associate Editor : Lim, Hyoun-Sub
Received 2017 November 20; Revised 2018 January 18; Accepted 2018 January 18.

Abstract

The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass (Zostera marina) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae. They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses.

Common eelgrass (Zostera marina) is a marine monocotyledonous angiosperm that belongs to the family Zosteraceae (Lee et al., 2016). It is one of the most common seagrasses and is predominantly found in temperate coastal waters of the northern and southern hemispheres. Common eelgrasses play important roles in the coastal ecosystem. They form a physical habitat for many marine organisms and act as carbon sinks for long-term carbon storage (Dahl et al., 2016; Reynolds et al., 2016). Seagrasses are believed to have returned to sea at least three times from a terrestrial ancestor (Les et al., 1997). Therefore, they are valuable resources for understanding the mechanisms involved in adapting from a terrestrial to a marine habitat. Several studies using next-generation sequencing for transcriptomic analysis have been performed to investigate the genetic basis of this adaptation, including the role of abiotic stresses such as variable or limited light conditions and salinity tolerance (Kong et al., 2013, 2014; Wissler et al., 2009, 2011).

When total RNA from plant samples was extracted for transcriptomic analysis, viral genome RNAs were isolated together with the host RNAs. As a result, many plant RNA-seq datasets contain associated RNA virus sequences, which can be identified by comprehensive bioinformatic analysis (Kim et al., 2014; Liu et al., 2012; Park and Hahn, 2017a, 2017b). In this study, two novel amalgaviruses were identified in a transcriptome dataset obtained for the common eelgrass (Kong et al., 2014).

Transcriptome dataset of young leaves of common eelgrass (Kong et al., 2014) was downloaded from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). The transcriptome data is available under the accession number SRA128272 and contains 11.1 gigabases of paired-end RNA-seq reads. Raw sequence data were screened using the sickle program (version 1.33; https://github.com/najoshi/sickle; parameters, -q 30 -l 50) to collect high-quality reads. De novo sequence assembly was performed using the SPAdes Genome Assembler (version 3.10.0; http://spades.bioinf.spbau.ru) (Bankevich et al., 2012).

To identify the virus-associated contigs in the assembled transcriptome contigs, a local BLASTX search was performed against a custom-built viral RNA-dependent RNA polymerase (RdRp) sequence database, using the following parameters: -evalue 1e-5 -max_target_seqs 1. The representative RdRp database of known RNA viruses was constructed using sequences obtained from the Pfam database (release 30.0; http://pfam.xfam.org). The core viral RdRp motif sequences were collected from 19 Pfam families having the following accession numbers: PF00602, PF00603, PF00604, PF00680, PF00946, PF00972, PF00978, PF00998, PF02123, PF03431, PF04196, PF04197, PF05788, PF05919, PF07925, PF08467, PF08716, PF08717, and PF12426. Finally, a total of 345 non-redundant viral RdRp motif sequences were converted to a BLAST-searchable database.

Two common eelgrass transcriptome contigs were identified that contained RdRp motifs similar to that present in Southern tomato virus (STV) (UniProt accession number, A8R3Y5; Pfam accession number, PF02123). The two contigs showed about 68% identity in their nucleotide sequences, thereby indicating that they were derived from related but distinct viruses. STV is a member of the genus Amalgavirus of the family Amalgaviridae (Sabanadzovic et al., 2009), which suggested that the two contigs were genomes of amalgaviruses or related viruses.

BLAST searches in the NCBI protein database confirmed that both contigs are closely related to plant amalgaviruses, including STV, Blueberry latent virus (BLV), and Rhododendron virus A (RHV-A) (Martin et al., 2011; Sabanadzovic et al., 2009, 2010). Therefore, the two contigs were tentatively named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). The viral genome sequences of ZmAV1 and ZmAV2 were 3383 and 3316 nt in length, respectively, and their annotation information is available in the NCBI nucleotide database under the accession numbers KY783316 (ZmAV1) and KY783317 (ZmAV2) (Table 1).

Summary of amalgaviruses identified in this study

The common eelgrass RNA-seq reads were mapped to the ZmAV1 and ZmAV2 genome contigs using BWA and the variants were identified using SAMtools/BCFtools (Li and Durbin, 2009). There were 13 and 34 polymorphic sites in ZmAV1 and ZmAV2 genome sequences, respectively (Supplementary Table 1, 2). Hence, each of genome contigs is a composite sequence derived from at least two closely related clones.

Amalgaviruses are double-stranded RNA viruses with a single genomic RNA segment and infect various plants (Liu and Chen, 2009; Martin et al., 2011; Sabanadzovic et al., 2009, 2010). Their genomes contain two partially overlapping open reading frames (ORFs), a 5′-proximal ORF (ORF1) that encodes a putative replication factory matrix-like protein and the second ORF (ORF2) that encodes a RdRp motif. ORF1+2, formed by the fusion of ORF1 and ORF2 proteins, is expressed using a +1 programmed ribosomal frameshift (PRF) mechanism (Depierreux et al., 2016; Nibert et al., 2016).

Amalgaviruses share similarities in their genomic organization with the members of family Totiviridae, which infect fungi and protozoa, and are also phylogenetically related to viruses of the family Partitiviridae having several hosts, including plants, fungi, and apicomplexans (Martin et al., 2011). Therefore, it has been proposed that amalgaviruses possibly represent a transitional intermediate between totiviruses and partitiviruses (Krupovic et al., 2015).

Similar to other amalgaviruses, two overlapping ORFs were predicted in the genome sequences of ZmAV1 and ZmAV2. The proximal ORFs (ORF1) of ZmAV1 and ZmAV2 encoded 382 and 395 amino acid (aa) long proteins, respectively. ORF1 proteins showed sequence and structural similarities to ORF1 proteins of other amalgaviruses. ZmAV1 and ZmAV2 ORF1 proteins were predicted to be exclusively comprised of α-helices and random coils, as observed in the other amalgaviruses (Krupovic et al., 2015).

The second protein encoded by amalgaviruses is an ORF1+2 fusion protein that uses a +1 PRF mechanism for proper translation. A +1 PRF motif sequence, UUU_CGN (underline, codon boundary for ORF1; N, any nucleotide; CGN, a rare arginine codon), is prevalent in plant amalgaviruses (Nibert et al., 2016; Park and Hahn, 2017b). The same +1 PRF motif was observed in other RNA viruses such as Zygosaccharomyces bailii virus Z (ZbV-Z) and influenza A viruses (Depierreux et al., 2016; Firth et al., 2012). The ZmAV1 and ZmAV2 genome sequences were predicted to have a putative +1 PRF sequence, UUU_CGU, which matched the consensus motif UUU_CGN (Fig. 1). The predicted ORF2 of ZmAV1 and ZmAV2 started at the nucleotide positions 942 and 943, respectively, which were the first bases after the +1 PRF site.

Fig. 1

(A) The predicted +1 programmed ribosomal frameshifting (PRF) motifs of ZmAV1 and ZmAV2. Both UUU and UUC codons can interact with the anticodon 3′-AAG-5′ of the phenylalanyl-tRNA (tRNAPhe). The tRNAPhe positioned on UUU is thought to slip forward by one nucleotide, causing a +1 frameshift for continued translation. The codon-anticodon base pairs are marked using dots. (B) Sequence comparison of the +1 PRF motif of amalgaviruses. The ZmAV1 and ZmAV2 +1 PRF motif sequences are marked in boldface letters in the last two rows. Sequence logo representation at the bottom clearly shows the conserved UUU_CGN motif, with uracil (U) being slightly preferred over the other bases before the motif and at the position N of the motif.

The ORF2-encoded peptides in ZmAV1 and ZmAV2 were 785 aa in length and had a conserved viral RdRp motif (Pfam accession number, PF02123). The ZmAV1 and ZmAV2 ORF1+2 fusion proteins produced by the +1 PRF mechanism were 1049 and 1062 aa in length, respectively, and were presumed to mediate viral replication.

The RdRp-motif-containing ORF2 protein sequences of ZmAV1 and ZmAV2 showed 45–50% aa sequence identity with those of the previously reported amalgaviruses (Table 2). The RdRp protein sequence identity threshold for assigning amalgaviruses to different species was 65–70% (Nibert et al., 2016). This indicated that ZmAV1 and ZmAV2 are novel species of amalgaviruses. Furthermore, protein sequence identity between the ZmAV1 and ZmAV2 RdRps was 65.4%, indicating that these two viruses could be considered different species. They showed about 25% aa sequence identity with the RdRp protein of ZbV-Z, the type species of the fungus-infecting genus Zybavirus, which was the most closely related genus to the plant-infecting genus Amalgavirus belonging to the family Amalgaviridae (Depierreux et al., 2016).

Identities among the RdRp motif-containing ORF2 protein sequences of ZmAV1, ZmAV2, and related viruses

The RdRp-motif containing ORF2 protein sequences of ZmAV1, ZmAV2, and other amalgaviruses were multiply aligned using MUSCLE (https://www.drive5.com/muscle) (Edgar, 2004). Phylogenetic analysis by the neighbor-joining method, using the MEGA7 software (http://www.megasoftware.net) (Kumar et al., 2016) confirmed that ZmAV1 and ZmAV2 are novel closely related species of amalgaviruses (Fig. 2). The observation that ZmAV1 and ZmAV2 formed a distinct clade among known amalgaviruses suggested that a single ancestral amalgavirus infected the common eelgrass in the past and subsequently diverged into two species during the course of evolution. However, it is also possible that two closely related amalgaviruses became independently associated with the common eelgrass.

Fig. 2

Phylogenetic tree of ZmAV1, ZmAV2, and related plant amalgaviruses. Multiple sequence alignment of the RdRp-motif containing ORF2 protein sequences was performed for inference of the phylogenetic tree. The fungus-infecting virus ZbV-Z was used as an outgroup. The bootstrap values calculated from 100 replicates are shown at the nodes. The +1 PRF site position labels used in Fig. 3A are marked in parentheses.

To analyze whether the +1 PRF site in the fusion protein was conserved, multiple sequence alignment of the ORF1+2 fusion protein sequences of ZmAV1, ZmAV2, and 22 other amalgaviruses was performed (Supplementary Fig. 1). Interestingly, the +1 PRF site of 24 amalgaviruses occurred only at three different positions, which were designated as positions #1, #2, and #3. The +1 PRF occurred at position #1 in 9 amalgaviruses, at position #2 in 2, and at position #3 in 13 (Fig. 3A). Two amalgaviruses, namely STV and Capsicum annuum amalgavirus 1 (CaAV1), had a +1 PRF motif different from those of the other amalgaviruses (Nibert et al., 2016). However, their +1 PRF motif occurred at the same position (position #3) as in the other 11 amalgaviruses.

Fig. 3

(A) The +1 PRF sites and surrounding regions. Multiple sequence alignment of regions encompassing the +1 PRF sites was performed using the 24 full-length amalgavirus ORF1+2 fusion protein sequences (Supplementary Fig. 1). The +1 PRF occurred at three positions which were labelled as #1, #2, and #3 and marked using red, green, and blue arrows, respectively. Number of viruses are shown in the parenthesis. The first aa residues of ORF2 are shown in colored boxes. (B) Sequence logo representations generated from all viruses (top), and those generated from viruses with the +1 PRF sites at position #1 (middle) and position #3 (bottom). Notably, the +1 PRF sites are embedded in a segment with diverse aa residues and surrounded by conserved regions.

The consensus +1 PRF motif sequence is UUU_CGN, where underline indicates codon boundary for ORF1. Therefore, the last aa residue of ORF1 part of ORF1+2 fusion protein is always a phenylalanine (a UUU codon), except in STV and CaAV1, which have a leucine residue (a CUU codon) at this position because the +1 PRF sequence of these viruses is CUU_AGN (Nibert et al., 2016). When a +1 PRF occurs, the codon boundary changes from UUU_CGN (ORF1) to U_UUC_GNN (ORF2). The first aa residue of ORF2 encoded by a GNN codon can be one of valine, alanine, aspartic acid, glutamic acid, or glycine. All of these aa residues were observed in the 24 fusion proteins (Fig. 3A), suggesting that any aa of them was acceptable at this position. The most frequent base present at the second base position of the GNN codon was uracil (U) (Fig. 1B). Because a GUN codon codes for a valine, the most common residue of the first aa of ORF2 is valine.

The three +1 PRF sites were located within a 13-aa-long segment, which contained diverse residues and was surrounded by many conserved residues (Fig. 3B). Interestingly, the three positions were evenly spaced. There were five residues present between positions #1 and #2 and between #2 and #3. Phylogenetic distribution of the +1 PRF sites revealed that the position shifting event occurred frequently during the evolution of amalgaviruses (Fig. 2). However, only the three specific positions closely located to each other were repeatedly involved. This observation strongly indicated that the +1 PRF site was highly constrained, probably by the folding of fusion proteins. Secondary structure prediction of selected fusion proteins using PSIPRED (version 3.3; http://bioinf.cs.ucl.ac.uk/psipred) (McGuffin et al., 2000) showed that the +1 PRF sites were preferentially located within a random coil segment between two α-helices, one from ORF1 and the other from ORF2, or near the tip of an α-helix. The +1 PRF site seems to be selected not to disrupt the proper folding of ORF1+2 fusion protein.

In conclusion, the full-length genome sequences of two novel amalgaviruses (ZmAV1 and ZmAV2) associated with common eelgrass were identified. Notably, no known viruses have been identified in eelgrasses (genus Zostera) (http://www.genome.jp/virushostdb; as of July 28, 2017) (Mihara et al., 2016). ZmAV1 and ZmAV2 are the first viruses to be identified in eelgrasses. Comparison of the ORF1+2 fusion proteins showed that three +1 PRF sites were preferred, potentially owing to structural constraints in fusion proteins.

Supplementary data

Acknowledgments

This research was supported by the National Research Foundation of Korea funded by the Korea Government (grant number 2017R1A1B400586).

References

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012;SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. 10.1089/cmb.2012.0021. 22506599. 3342519.
Dahl M, Deyanova D, Gutschow S, Asplund ME, Lyimo LD, Karamfilov V, Santos R, Bjork M, Gullstrom M. 2016;Sediment properties as important predictors of carbon storage in Zostera marina meadows: a comparison of four European areas. PLoS One 11:e0167493. 10.1371/journal.pone.0167493. 27936111. 5147920.
Depierreux D, Vong M, Nibert ML. 2016;Nucleotide sequence of Zygosaccharomyces bailii virus Z: Evidence for +1 programmed ribosomal frameshifting and for assignment to family Amalgaviridae. Virus Res 217:115–124. 10.1016/j.virusres.2016.02.008. 26951859. 5517306.
Edgar RC. 2004;MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. 10.1093/nar/gkh340. 15034147. 390337.
Firth AE, Jagger BW, Wise HM, Nelson CC, Parsawar K, Wills NM, Napthine S, Taubenberger JK, Digard P, Atkins JF. 2012;Ribosomal frameshifting used in influenza A virus expression occurs within the sequence UCC_UUU_CGU and is in the +1 direction. Open Biol 2:120109. 23155484. 3498833.
Kim DS, Jung JY, Wang Y, Oh HJ, Choi D, Jeon CO, Hahn Y. 2014;Plant RNA virus sequences identified in kimchi by microbial metatranscriptome analysis. J Microbiol Biotechnol 24:979–986. 10.4014/jmb.1404.04017. 24836186.
Kong F, Zhou Y, Sun P, Liu L, Mao Y. 2013;Generation and analysis of expressed sequence tags fromthe salt-tolerant eelgrass species, Zostera marina. Acta Oceanologica Sinica 32:68–78. 10.1007/s13131-013-0343-z.
Kong F, Li H, Sun P, Zhou Y, Mao Y. 2014;De novo assembly and characterization of the transcriptome of seagrass Zostera marina using Illumina paired-end sequencing. PLoS One 9:e112245. 10.1371/journal.pone.0112245. 25423588. 4244107.
Krupovic M, Dolja VV, Koonin EV. 2015;Plant viruses of the Amalgaviridae family evolved via recombination between viruses with double-stranded and negative-strand RNA genomes. Biol Direct 10:12. 10.1186/s13062-015-0047-8. 25886840. 4377212.
Kumar S, Stecher G, Tamura K. 2016;MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874. 10.1093/molbev/msw054. 27004904.
Lee H, Golicz AA, Bayer PE, Jiao Y, Tang H, Paterson AH, Sablok G, Krishnaraj RR, Chan CK, Batley J, Kendrick GA, Larkum AW, Ralph PJ, Edwards D. 2016;The genome of a southern hemisphere seagrass species (Zostera muelleri). Plant Physiol 172:272–283. 10.1104/pp.16.00868. 27373688. 5074622.
Les DH, Cleland MA, Waycott M. 1997;Phylogenetic studies in alismatidae, II: Evolution of marine angiosperms (seagrasses) and hydrophily. Syst Bot 22:443–463. 10.2307/2419820.
Li H, Durbin R. 2009;Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. 10.1093/bioinformatics/btp324. 19451168. 2705234.
Liu H, Fu Y, Xie J, Cheng J, Ghabrial SA, Li G, Yi X, Jiang D. 2012;Discovery of novel dsRNA viral sequences by in silico cloning and implications for viral diversity, host range and evolution. PLoS One 7:e42147. 10.1371/journal.pone.0042147. 22848734. 3407116.
Liu W, Chen J. 2009;A double-stranded RNA as the genome of a potential virus infecting Vicia faba. Virus Genes 39:126–131. 10.1007/s11262-009-0362-1. 19472044.
Martin RR, Zhou J, Tzanetakis IE. 2011;Blueberry latent virus: an amalgam of the Partitiviridae and Totiviridae. Virus Res 155:175–180. 10.1016/j.virusres.2010.09.020.
McGuffin LJ, Bryson K, Jones DT. 2000;The PSIPRED protein structure prediction server. Bioinformatics 16:404–405. 10.1093/bioinformatics/16.4.404. 10869041.
Mihara T, Nishimura Y, Shimizu Y, Nishiyama H, Yoshikawa G, Uehara H, Hingamp P, Goto S, Ogata H. 2016;Linking virus genomes with host taxonomy. Viruses 8:66. 10.3390/v8030066. 26938550. 4810256.
Nibert ML, Pyle JD, Firth AE. 2016;A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses. Virology 498:201–208. 10.1016/j.virol.2016.07.002. 27596539. 5052127.
Park D, Hahn Y. 2017a;Genome sequence of Spinach cryptic virus 1, a new member of the genus Alphapartitivirus (family Partitiviridae), identified in spinach. J Microbiol Biotechnol 27:834–837. 10.4014/jmb.1611.11026.
Park D, Hahn Y. 2017b;Genome sequences of Spinach deltapartitivirus 1, Spinach amalgavirus 1, and Spinach latent virus identified in spinach transcriptome. J Microbiol Biotechnol 27:1324–1330. 10.4014/jmb.1703.03043.
Reynolds LK, DuBois K, Abbott JM, Williams SL, Stachowicz JJ. 2016;Response of a habitat-forming marine plant to a simulated warming event is delayed, genotype specific, and varies with phenology. PLoS One 11:e0154532. 10.1371/journal.pone.0154532. 27258011. 4892549.
Sabanadzovic S, Abou Ghanem-Sabanadzovic N, Valverde RA. 2010;A novel monopartite dsRNA virus from rhododendron. Arch Virol 155:1859–1863. 10.1007/s00705-010-0770-5. 20721591.
Sabanadzovic S, Valverde RA, Brown JK, Martin RR, Tzanetakis IE. 2009;Southern tomato virus: The link between the families Totiviridae and Partitiviridae. Virus Res 140:130–137. 10.1016/j.virusres.2008.11.018. 19118586.
Wissler L, Codoner FM, Gu J, Reusch TB, Olsen JL, Procaccini G, Bornberg-Bauer E. 2011;Back to the sea twice: identifying candidate plant genes for molecular evolution to marine life. BMC Evol Biol 11:8. 10.1186/1471-2148-11-8. 21226908. 3033329.
Wissler L, Dattolo E, Moore AD, Reusch TB, Olsen JL, Migliaccio M, Bornberg-Bauer E, Procaccini G. 2009;Dr. Zompo: an online data repository for Zostera marina and Posidonia oceanica ESTs. Database (Oxford) 2009;bap009. 10.1093/database/bap009.

Article information Continued

Fig. 1

(A) The predicted +1 programmed ribosomal frameshifting (PRF) motifs of ZmAV1 and ZmAV2. Both UUU and UUC codons can interact with the anticodon 3′-AAG-5′ of the phenylalanyl-tRNA (tRNAPhe). The tRNAPhe positioned on UUU is thought to slip forward by one nucleotide, causing a +1 frameshift for continued translation. The codon-anticodon base pairs are marked using dots. (B) Sequence comparison of the +1 PRF motif of amalgaviruses. The ZmAV1 and ZmAV2 +1 PRF motif sequences are marked in boldface letters in the last two rows. Sequence logo representation at the bottom clearly shows the conserved UUU_CGN motif, with uracil (U) being slightly preferred over the other bases before the motif and at the position N of the motif.

Fig. 2

Phylogenetic tree of ZmAV1, ZmAV2, and related plant amalgaviruses. Multiple sequence alignment of the RdRp-motif containing ORF2 protein sequences was performed for inference of the phylogenetic tree. The fungus-infecting virus ZbV-Z was used as an outgroup. The bootstrap values calculated from 100 replicates are shown at the nodes. The +1 PRF site position labels used in Fig. 3A are marked in parentheses.

Fig. 3

(A) The +1 PRF sites and surrounding regions. Multiple sequence alignment of regions encompassing the +1 PRF sites was performed using the 24 full-length amalgavirus ORF1+2 fusion protein sequences (Supplementary Fig. 1). The +1 PRF occurred at three positions which were labelled as #1, #2, and #3 and marked using red, green, and blue arrows, respectively. Number of viruses are shown in the parenthesis. The first aa residues of ORF2 are shown in colored boxes. (B) Sequence logo representations generated from all viruses (top), and those generated from viruses with the +1 PRF sites at position #1 (middle) and position #3 (bottom). Notably, the +1 PRF sites are embedded in a segment with diverse aa residues and surrounded by conserved regions.

Table 1

Summary of amalgaviruses identified in this study

Acronym Full name Accession Length (nt) ORF Position Length (aa)
ZmAV1 Zostera marina amalgavirus 1 KY783316 3383 Fusion protein (RNA-dependent RNA polymerase) 149–940, 942–3299 1049
Putative replication factory matrix-like protein 149–1279 382
ZmAV2 Zostera marina amalgavirus 2 KY783317 3316 Fusion protein (RNA-dependent RNA polymerase) 111–941, 943–3300 1062
Putative replication factory matrix-like protein 111–1298 395

Table 2

Identities among the RdRp motif-containing ORF2 protein sequences of ZmAV1, ZmAV2, and related viruses

Acronym Full name Accession numbera Identity with ZmAV1b Identity with ZmAV2b
BLV Blueberry latent virus NC_014593.1 369/744 (50%) 368/720 (51%)
FpAV2 Festuca pratensis amalgavirus 2 GBXZ01002308.1 357/722 (49%) 371/733 (51%)
FpAV3 Festuca pratensis amalgavirus 3 GBXZ01009138.1 381/743 (51%) 375/729 (51%)
LpAV1 Lolium perenne amalgavirus 1 GAYX01076418.1 367/717 (51%) 373/744 (50%)
AcAV1 Allium cepa amalgavirus 1 GAAO01011981.1 340/719 (47%) 344/733 (47%)
AcAV2 Allium cepa amalgavirus 2 GAAN01008476.1 362/769 (47%) 370/776 (48%)
PeAV1 Phalaenopsis equestris amalgavirus 1 GDHJ01028335.1 341/749 (46%) 343/727 (47%)
CaAV1 Capsicum annuum amalgavirus 1 JW101175.1 334/766 (44%) 334/759 (44%)
STV Southern tomato virus NC_011591.1 324/705 (46%) 326/737 (44%)
AoAV1 Anthoxanthum odoratum amalgavirus 1 GBIE01024896.1 327/731 (45%) 332/715 (46%)
FpAV1 Festuca pratensis amalgavirus 1 GBXZ01049574.1 315/716 (44%) 343/720 (48%)
RHV-A Rhododendron virus A NC_014481.1 345/717 (48%) 356/727 (49%)
MsAV1 Medicago sativa amalgavirus 1 GAFF01077243.1 318/723 (44%) 335/725 (46%)
VCV-M Vicia cryptic virus M EU371896.1 329/715 (46%) 330/725 (46%)
CdAV1 Cleome droserifolia amalgavirus 1 GDRJ01026949.1 338/727 (46%) 347/729 (48%)
CoAV1 Camellia oleifera amalgavirus 1 GEFY01004381.1 358/759 (47%) 372/777 (48%)
GaAV1 Gevuina avellana amalgavirus 1 GEAC01063629.1 350/712 (49%) 372/769 (48%)
EbAV1 Erigeron breviscapus amalgavirus 1 GDQF01098448.1 358/732 (49%) 352/729 (48%)
EbAV2 Erigeron breviscapus amalgavirus 2 GDQF01120453.1 338/721 (47%) 360/741 (49%)
ScAV1 Secale cereale amalgavirus 1 GCJW01039808.1 331/716 (46%) 339/728 (47%)
SpAV1 Spinach amalgavirus 1 KY695011.1 356/722 (49%) 359/740 (49%)
PpAV1 Pinus patula amalgavirus 1 GECO01025317.1 329/719 (46%) 335/746 (45%)
ZbV-Z Zygosaccharomyces bailii virus Z KU200450.1 127/512 (25%) 118/516 (23%)
a

Accession numbers of viral genome sequences.

b

Amino acid sequence identities have been described in the following format: identical residues/aligned length (% identity).