search for


De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data
The Plant Pathology Journal 2017;33:478-487
Published online October 1, 2017
© 2017 The Korean Society of Plant Pathology.

Yeonhwa Jo1, Hoseong Choi1, Miah Bae1, Sang-Min Kim2, Sun-Lim Kim2, Bong Choon Lee2, Won Kyong Cho1,*, and Kook-Hyung Kim1,*

1Department of Agricultural Biotechnology, Research Institute of Agriculture and Life Sciences, and Plant Genomics and Breeding Institute, College of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Korea, 2Crop Foundation Division, National Institute of Crop Science, RDA, Wanju 55365, Korea
Correspondence to: *Co-corresponding authors. WK Cho Phone) +82-2-880-4687, FAX) +82-2-873-2317, E-mail) K-H Kim, Phone) +82-2-880-4677, Fax) +82-2-873-2317, E-mail)
Received March 20, 2017; Revised June 7, 2017; Accepted June 27, 2017.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

Keywords : de novo genome assembly, single nucleotide variation, soybean mosaic virus

Soybean (Glycine max (L.) Merr.) is the most important legume crop, representing 50% of the global legume crop area and 68% of global legume production (Herridge et al., 2008). Soybean is consumed as health food, providing a rich source of proteins, and as well as vegetative oil production (Messina, 1999; Pimentel and Patzek, 2005). Moreover, soybean plays an important role for dinitrogen (N2) fixation, which is an important natural process (Herridge et al., 2008).

Several diseases in soybean, such as cyst, brown spot, charcoal rot, and sclerotinia stem rot, lead to yield losses in major soybean-producing countries (Wrather et al., 2001). In addition, soybean can be infected by diverse viruses. Although a small numbers of viruses infecting soybean cause serious economic problems in soybean production, it is always important to control and to manage viral diseases in soybeans (Hill and Whitham, 2014). The best known soybean virus is Soybean mosaic virus (SMV), a member of the family Potyviridae, causing soybean mosaic disease. In addition, bean pod mottle virus (BPMV), soybean vein necrosis virus, tobacco ringspot virus, soybean dwarf virus, peanut mottle virus, peanut stunt virus, and alfalfa mosaic virus are important viruses infecting soybeans (Hill and Whitham, 2014).

Many plant viruses have been identified based on viral disease symptoms and several detection methods. However, virus infection in plants does not always cause disease symptoms, and many plants showing viral disease symptoms are very often co-infected by different viruses. Recent advances in next generation sequencing (NGS) technology lead to identification of numerous known as well as novel viruses by means of metagenomics (Barba et al., 2014; Massart et al., 2014). Not only NGS data for virus detection but also many plant transcriptome data contain virus sequences, which might be amplified along with infected host transcripts (Burger and Maree, 2015; Jo et al., 2016). The identification of virus sequences in the plant transcriptome is no longer surprising, because most plant viruses are RNA viruses and many of them carry poly(A) tail, which is easily amplified by oligo d(T) primers for cDNA synthesis.

Recently, we carried out a large-scale screening to identify viruses infecting soybean in the world using available soybean transcriptome data. Of them, we found that a soybean transcriptome for soybean seed development analysis contains many virus sequences. In this study, we conducted a bioinformatics analyses for virus identification, virus genome assembly, phylogenetic analysis, and single nucleotide variations of the SMV.

Materials and Methods

Plant materials, library preparation, and next generation sequencing

The plant material used for RNA-Seq was soybean cultivar Heinong44. Plants were grown in the experimental station in Beijing from May to August according to the previous study (Song et al., 2013). Total RNAs were extracted from seeds at six different developmental stages, which were classified according to the seed weight. The cDNA was synthesized using poly(A)-containing RNAs. A single RNA-Seq library was constructed and sequenced by single-end sequencing using the Illumina HiSeq 2000 system. The raw data is available in the SRA database (

Raw data processing and de novo transcriptome assembly

All bioinformatics analyses were performed in the Linux (Linux Mint version 17)-installed workstation (four 16-core CPUs and 256 GB ram). We downloaded the raw data from the SRA database using the SRA toolkit (Leinonen et al., 2011). The raw SRA data were converted to FASTQ files using the SRA toolkit. For the de novo assembly of transcriptomes, we used Trinity version 2.0.6 (Haas et al., 2013). De novo transcriptome assembly was performed according to the manuals provided by developers with default parameters.

Identification of viruses and sequence alignment

To identify virus-associated contigs, we conducted blast search using standalone BLAST version 2.1.19 installed in the Linux system (Madden, 2013). All assembled contigs were subjected to MEGABLAST search, which is optimized for highly similar sequences, against complete reference sequences for viruses and viroids ( with E value 1e-5 as a cutoff. In addition, all raw data were converted to FASTA files using the SRA toolkit and subjected to a MEGABLAST search against the viral reference database with E value 1e-5 as a cutoff. We used the Burrows–Wheeler Aligner (BWA) software for sequence alignment on the reference virus genome with default parameters (Li and Durbin, 2009).

De novo assembly of SMV genomes

The 79 SMV-associated contigs identified by the BLAST were retrieved by the BLASTCMD program in the standalone BLAST system. To assemble SMV genomes, the identified viral contigs were aligned against the SMV reference genome (NC_002634.1) using ClustalW implemented in the MEGA6 program (Tamura et al., 2013) The nearly complete consensus genome of SMV was manually obtained. Raw data were again aligned on the assembled consensus SMV genome to confirm sequences by BWA. The poly(A) tail at the 3′ end of the assembled SMV genome was removed. We obtained a nearly complete consensus genome for SMV China (accession number NC_002634.1) from soybean transcriptome.

Identification of SNVs in soybean transcriptome

In order to analyze SNVs of SMV China in the soybean transcriptome, the raw data were aligned on the consensus genome of SMV China using the BWA program with default parameters. The aligned SAM files by BWA were converted into BAM files by SAMtools (Li et al., 2009). For SNV calling, we sorted the BAM files and then generated the VCF file format using mpileup (Danecek et al., 2011). BCFtools implemented in SAMtools was finally used to call SNVs. The positions of identified SNVs on the SMV genome were visualized by the Tablet program (Milne et al., 2010).

Construction of phylogenetic trees

In order to reveal phylogenetic relationships of the obtained consensus genome for SMV China with known SMV isolates, we generated three phylogenetic trees. The complete SMV isolate China genome sequence as well as two polyprotein sequences were blasted against NCBI nucleotide and non-redundant protein databases. Best-matched sequences were retrieved for the construction of phylogenetic tree. The obtained sequences were aligned by the ClustalW program with default parameters. After alignment, we deleted unnecessary sequences. The manually edited aligned sequences were subjected to construction of a phylogenetic tree using the MEGA6 program. The phylogenetic tree was constructed by the neighbor-joining method, with 1,000 bootstrap replicates.


De novo soybean transcriptome assembly and identification of viruses in the soybean seeds

We screened available soybean transcriptome data deposited in NCBI’s Sequence Read Archive (SRA) database in order to identify viruses infecting soybean. Of screened soybean transcriptomes, a transcriptome conducting a gene expression profile during soybean seed development contains several virus-associated sequences (accession number SRR1777405) (Song et al., 2013). In order to identify virus-associated contigs, we de novo assembled the transcriptome of soybean using Trinity program, resulting in 116,108 transcripts (contigs) with 710 bp for contig N50 (Table 1). Next, we blasted 116,108 transcripts against the viral reference database. After removing redundant sequences and endogenous viral sequences, we identified 83 contigs-associated with viruses (Table 2). Most contigs (79 contigs) were associated with SMV. The lengths of SMV-associated contigs ranged from 224 to 3,636 nt (Fig. 1A). Four contigs were associated with BPMV, lettuce infectious yellow virus (LICV), lettuce chlorosis virus (LCV), and cucumber mosaic virus (CMV), respectively. The lengths of contigs associated with the four viruses ranged from 232 nt (LCV RNA2) to 1,015 nt (bean common mosaic virus) (Fig. 1A). Other than a contig-associated with LICV (1E-08), virus-associated contigs display reliable E values indicating significance of blast results (Table 2).

De novo genome assembly of SMV from a soybean transcriptome

Of identified viruses, SMV was severely infected in the soybean seeds. Fortunately, 79 contigs associated with SMV mostly covered the SMV reference genome (Table 2). A total of 79 contigs associated with SMV were mapped on the SMV reference genome (accession number NC_002634.1) (Eggenberger et al., 1989) (Fig. 1B). After sequence alignment followed by manual modification, we assembled a nearly complete consensus genome of SMV referred as SMV China (Fig. 1C). The SMV China is composed of 9,507 nucleotides (nt) encoding two proteins such as GP1 and GP2. GP1 encodes a polyprotein (nt 54 to 9,254) which is further cleaved into ten mature proteins such as P1 (P1 proteinase), HC-Pro (helper component proteinase), P3 (P3 protein), 6K1 (6K1 protein), CI (cylindrical inclusion), 6K2, NIa-VPg (Nuclear inclusion protein a-genome linked viral protein), NIa-Pro, NIb (nuclear inclusion protein b), and coat protein (CP) while GP2 encodes PIPO (pretty interesting potyviridae ORF) protein (nt 2,804 to 3,031) (Fig. 1C).

Phylogenetic relationships of the SMV isolate China

In order to find genetic relationships of the assembled SMV China with known SMV isolates, we constructed phylogenetic trees. The phylogenetic tree using SMV complete genome sequences showed two groups of SMV isolates (Fig. 2A). The SMV China belongs to group B along with two SMV isolates from South Korea. Using polyprotein sequences, the SMV China in group C was distantly related with other SMV isolates (Fig. 2B). The phylogenetic tree using PIPO protein sequences confirmed that SMV China is a member of SMV belonging to group A, which contains seven viruses including BPMV (Fig. 2C). Based on phylogenetic analyses, it seems that the consensus genome of SMV China is genetically close to the SMV isolates from South Korea.

Single nucleotide variations of SMV in the soybean seeds

It is well known that RNA viruses exhibit quasispecies nature, exhibiting several variants in the infected host. Therefore, we examined single nucleotide variations (SNVs) for SMV in the soybean seeds. The identified SMV China was used as a reference. After BWA alignment of raw data against SMV China, SNVs were identified using SAMtools (Fig. 3A). The SNVs in this study was derived from a population of different isolates. As a result, we identified 780 SNVs (Supplementary Table 1). SNVs were evenly distributed along the SMV genome (Fig. 3B). Most SNVs were Single nucleotide polymorphisms (SNPs) except one InDel (CAGG to CAGGAGG) at nt 640 of SMV China (Table S1). Four SNVs, C-U (190 SNVs), U-C (180 SNVs), A-G (168 SNVs), and G-A (155 SNVs), were frequently identified (Fig. 3C). Based on SNV results, the mutation rate for SMV in the soybean seeds was 8.2045%, indicating a high level of mutations for the SMV RNA genome. In addition, we calculated the ratio of Ts/Tv (Transition versus Transversion). The Ts/Tv ratio for SMV China was 8.06 (693/86).

The amount of viral RNA in the soybean transcriptome

It might be of interest to examine viral RNAs in the analyzed soybean transcriptome. Of 116,108 contigs, virus-associated contigs account for 0.068% (79 contigs). The length of total assembled contigs was 67,363,642 bp and the total length of virus-associated contigs 36,022 bp, accounting for 0.0535%. The amount of virus-associated reads accounts for 0.0529% (39,403/74,431,152) of reads. Moreover, we calculated SMV copy numbers within the soybean transcriptome resulting in 414 SMV virus copies, which is highly correlated with sequence coverage of SMV genome. This result indicates high variability of SMV genome.


Development of NGS provides various DNA as well as RNA sequencing data (Metzker, 2010). The main purposes of DNA and RNA sequencing is elucidation of the genome and transcriptome of target eukaryotic and prokaryotic organisms (Morozova and Marra, 2008). In case of bacteria, metagenomics using 16s rRNA sequences that are highly conserved in bacteria species is intensively performed to study bacterial communities under specific conditions (Wang and Qian, 2009). However, viruses do not have any conserved sequences like bacteria, and genomes of viruses are mostly very small (Edwards and Rohwer, 2005). Therefore, virus-specific sequencing usually requires a purification step for NGS. For example, extraction of double-stranded RNAs from virus-infected organisms followed by NGS is one of the efficient approaches to identify viruses (Yanagisawa et al., 2016). Moreover, sequencing of small RNAs is an alternative technique for virus identification and genome assembly (Vodovar et al., 2011). In addition, RNA-Seq is also a good technique to identify viruses that have a poly(A) tail. However, several recent studies demonstrated that viruses and viroids without a poly(A) tail can be detected by RNA-Seq (Burger and Maree, 2015; Jo et al., 2016).

In this study, we identified several viruses infecting soybean. This transcriptome was initially conducted for expression profiling of soybean seed development. Thus, this transcriptome is not derived from a single condition but from six developmental seed stages in which several seeds might be included for total RNA extraction. Although we identified five viruses that might infect soybean, four viruses other than SMV were identified based on only one single contig, and their presence should be validated by other methods. In many cases, the partial viral sequence or contig is homologous to a closely related virus, not the target virus. Thus, it is possible that the identified virus-associated contigs might be not from the infected viruses but from other viruses which share similar viral sequences.

SMV is seed-borne and transmitted by aphids (Domier et al., 2011). Soybean seeds infected by SMV often display a discolored and mottled seed. In addition, BCMV is known as a seed-borne virus (Refugee et al., 1987). Seed-borne viruses can be actually infected in embryo, such as BCMV, or carried on the seed coat (Jafarpour et al., 1979). In addition, seed transmission of CMV has been identified in several plants such as pepper, spinach, and lupin (Ali and Kobayashi, 2010; Wylie et al., 1993; Yang et al., 1997). Based on previous knowledge on seed-borne viruses, the identification of SMV, BCMV, and CMV in the soybean seed is not surprising. In addition, the infection of LCV in green bean (Phaseolus vulgaris L.) has been recently reported (Ruiz et al., 2014). However, the infection of LIYV and LCV, which are members in the genus Crinivirus, in the soybean seed should be validated.

The soybean transcriptome was derived not from a single soybean seed but from a mixture of soybeans which were further divided into six developmental stages of seeds. The lengths of assembled contigs-associated with SMV in this study might be shorter than virus-associated contigs from a single plant due to the transcriptome containing several variants of SMV. Therefore, the assembled genome of SMV China is a consensus sequence of several SMV variants. Although the portion of SMV-associated sequences accounted for about 0.05% in the total transcriptome, the coverage of SMV genome in this study was about 414, and its coverage was also visualized by the alignment of raw data on the genome of SMV China. As a result, we could de novo assemble SMV genome based on enough sequence data associated with SMV.

Based on the assembled SMV genome, we could also identify SNVs for SMV. As we expected, we found several SNVs that resulted from a mixture of SMV infected diverse seed samples. However, we could not reveal the exact number of variants. Furthermore, the identification of SNVs in SMV demonstrated that not a specific region of SMV but several regions of SMV genome were highly mutated. The presence of several SMV variants in the soybean seeds is a very interesting finding, indicating that SMV is highly replicated in the developing seeds; this might be correlated with some disease symptoms in the soybean seeds caused by SMV. It might be of interest to examine replication rates of SMV in different developmental stages and tissues; this could provide evidence of the quasispecies nature of SMV in the near future.

Phylogenetic analyses suggested that the identified SMV isolate China was very different from other known SMV isolates based on polypeptide sequences. However, SMV isolate China seems to be highly correlated with two SMV isolates from South Korea, suggesting the phylogenetic correlation between geographical regions and SMV isolates.

Our SNV analysis in the soybean seeds indicates a high level of quasispecies nature for SMV. Mutations were not in a specific region but in most regions of SMV genome. Furthermore, we found that A-G and C-U conversions and vice and versa were frequent.

Taken together, our bioinformatics analyses using soybean seed transcriptomes identified five viruses infecting the soybean seeds. Of these five viruses, we de novo assembled the genome of SMV isolate China and analyzed SNVs revealing quasispecies nature of SMV in the soybean seeds for the first time. Our approaches and analyses in this study are valuable for the virus-associated studies using NGS-based transcriptome data.


This work was partially supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government Ministry of Education (No. NRF-2016R1D1A1A02937216); the Agenda Program, the Rural Development Administration (RDA) (No. PJ01194803); and the Vegetable Breeding Research Center (No. 710001-05) through the Agriculture Research Center program from the Ministry for Food, Agriculture and Rural Affairs, Republic of Korea. WKC was supported by a research fellowship from the Brain Korea 21 Plus Project.

Supplementary Information
Fig. 1. De novo assembly of SMV isolate in China using transcriptome data. (A) Size distribution of virus-associated contigs. Red-colored bar indicates SMV-associated contigs. Four viruses with respective contig length were indicated. (B) Alignment of 79 SMV-associated contigs on the assembled genome of SMV isolate in China using BWA program. Black bar indicates the reference SMV genome. Sequence alignment was visualized by Tablet program. (C) Genome organization of SMV isolate in China. The nucleotide positions of two proteins, GP1 and GP2, were indicated.
Fig. 2. Phylogenetic relationship of the assembled SMV isolate China with known SMV isolates. Phylogenetic trees of SMV isolates using complete genomes (A), polyproteins (B), and PIPO sequences (C). The respective genome and protein sequences were blasted against NCBI database and highly matched sequences were used for construction of phylogenetic trees using MEGA6 program using neighbor-joining method with 1000 bootstrap replications. Kimura 2-parameter and Poisson substitution model were used for nucleotide and protein sequences, respectively.
Fig. 3. SNVs of SMV in the soybean seed transcriptome. (A) Raw data were mapped on the genome sequence of SMV isolate China using BWA and visualized by Tablet program. (B) The positions of identified single nucleotide variations on the SMV were visualized by Tablet program. Detailed information for SNVs can be found in . (C) The numbers of identified SNVs of SMV in the soybean seed transcriptome.

Summary of de novo soybean transcriptome assembly using Trinity

Accession number SRR1777405a
Total trinity transcripts116108
Percent GC43.97
Contig N50710 bp
Median contig length428 bp
Average contig580.18 bp
 Total assembled bases 67363642 bp

aWe assembled raw data from two different libraries using Trinity program.

The statistics of assembled contigs were calculated by in the Trinity program.

Summary of blast results to identify virus-associated contigs

Query idSubject idName of virusIdentity (%)Alignment lengthMismatchesGap opensQuery startQuery endSubject startSubject endE valueBit score
TR2274|c0_g1_i1NC_002634.1Soybean mosaic virus93.132331602234857188033.00E-93342
TR3618|c0_g1_i1NC_002634.1Soybean mosaic virus91.022562301256134215972.00E-94346
TR3618|c0_g1_i2NC_002634.1Soybean mosaic virus90.582762601276134216172.00E-100366
TR3858|c0_g1_i1NC_002634.1Soybean mosaic virus97.3526470126491011732.00E-125449
TR3858|c0_g1_i2NC_002634.1Soybean mosaic virus96.623580123593911731.00E-107390
TR4672|c0_g1_i1NC_002634.1Soybean mosaic virus96.55261901261903692962.00E-120433
TR4672|c0_g1_i2NC_002634.1Soybean mosaic virus97.7261601261903692962.00E-125449
TR5077|c1_g1_i1NC_002634.1Soybean mosaic virus94.192581503260468049379.00E-109394
TR5077|c1_g1_i2NC_002634.1Soybean mosaic virus91.472582203260468049374.00E-97355
TR5102|c0_g1_i1NC_002634.1Soybean mosaic virus91.962241801224755273296.00E-85315
TR5869|c0_g1_i1NC_002634.1Soybean mosaic virus91.982121705216724370326.00E-80298
TR5869|c0_g2_i1NC_002634.1Soybean mosaic virus92.452121605216724370321.00E-81303
TR5869|c0_g3_i1NC_002634.1Soybean mosaic virus92.922121505216724370323.00E-83309
TR5869|c0_g4_i1NC_002634.1Soybean mosaic virus92.922121505216724370323.00E-83309
TR7406|c0_g1_i1NC_002634.1Soybean mosaic virus94.642801501280267729566.00E-121435
TR7406|c0_g1_i2NC_002634.1Soybean mosaic virus92.122411901241267729171.00E-92340
TR7406|c0_g1_i3NC_002634.1Soybean mosaic virus94.162741601274267729505.00E-116418
TR7406|c0_g1_i4NC_002634.1Soybean mosaic virus93.362411601241267729171.00E-97357
TR8100|c0_g1_i1NC_002634.1Soybean mosaic virus97.862345012245606062934.00E-112405
TR9520|c0_g1_i1NC_002634.1Soybean mosaic virus95.063851901385826878842.00E-172606
TR9520|c0_g1_i2NC_002634.1Soybean mosaic virus96.65239804242812278846.00E-110398
TR9520|c0_g1_i3NC_002634.1Soybean mosaic virus94.663561901356826879132.00E-156553
TR9520|c0_g1_i4NC_002634.1Soybean mosaic virus94.383562001356826879139.00E-155547
TR9520|c0_g1_i5NC_002634.1Soybean mosaic virus95.063851901385826878842.00E-172606
TR9520|c0_g1_i6NC_002634.1Soybean mosaic virus96.19210804213812279138.00E-94344
TR9520|c0_g1_i7NC_002634.1Soybean mosaic virus96.883851201385826878840645
TR13605|c0_g1_i1NC_002634.1Soybean mosaic virus92.2540031010409866590648.00E-161568
TR13605|c0_g1_i2NC_002634.1Soybean mosaic virus94.7540021010409866590642.00E-177623
TR15892|c0_g1_i1NC_002634.1Soybean mosaic virus92.642311702232584556152.00E-90333
TR20496|c0_g1_i1NC_002634.1Soybean mosaic virus96.88224701224208718643.00E-103375
TR22770|c0_g1_i1NC_002634.1Soybean mosaic virus91.672402001240641366522.00E-90333
TR22770|c0_g1_i2NC_002634.1Soybean mosaic virus92.532812102282637266522.00E-111403
TR25078|c0_g1_i1NC_002634.1Soybean mosaic virus88.542532901253873084781.00E-82307
TR25078|c0_g2_i1NC_002634.1Soybean mosaic virus94.7224613016261862783822.00E-105383
TR25078|c0_g2_i2NC_002634.1Soybean mosaic virus93.73492201349873083822.00E-147523
TR25078|c0_g2_i3NC_002634.1Soybean mosaic virus95.721878043229856883825.00E-81302
TR25078|c0_g2_i4NC_002634.1Soybean mosaic virus90.912532301253873084781.00E-92340
TR32819|c0_g1_i1NC_002634.1Soybean mosaic virus91.72652202266251522516.00E-101368
TR32819|c0_g2_i1NC_002634.1Soybean mosaic virus92.082652102266251522511.00E-102374
TR34507|c0_g1_i1NC_002634.1Soybean mosaic virus87.273774444378352331491.00E-118427
TR37651|c0_g1_i1NC_002634.1Soybean mosaic virus87.6121824322184106252.00E-65250
TR37651|c0_g3_i1NC_002634.1Soybean mosaic virus87.2748757424874108921.00E-155551
TR37706|c0_g2_i1NC_002634.1Soybean mosaic virus90.512742421273112814009.00E-99361
TR41793|c1_g1_i1NC_002634.1Soybean mosaic virus92.893942801394748378762.00E-162573
TR41793|c1_g1_i2NC_002634.1Soybean mosaic virus93.154382911437748379200641
TR41793|c1_g1_i3NC_002634.1Soybean mosaic virus91.5521318023235748676988.00E-79294
TR41793|c1_g1_i4NC_002634.1Soybean mosaic virus93.934452701445748379270673
TR41793|c1_g1_i5NC_002634.1Soybean mosaic virus91.592261901226747376982.00E-84313
TR41793|c1_g1_i6NC_002634.1Soybean mosaic virus93.034453101445748379270651
TR41793|c1_g1_i7NC_002634.1Soybean mosaic virus91.174193701419747378912.00E-161569
TR44246|c0_g1_i2NC_002634.1Soybean mosaic virus87.9157181872424776332.00E-45183
TR44822|c4_g1_i1NC_002634.1Soybean mosaic virus97.8346010024618433840795
TR44822|c4_g1_i2NC_002634.1Soybean mosaic virus97.6576518027668437901314
TR44822|c4_g1_i3NC_002634.1Soybean mosaic virus97.27622141262384322501051
TR44822|c4_g2_i1NC_002634.1Soybean mosaic virus90.131256122211255199173701631
TR44822|c4_g2_i2NC_002634.1Soybean mosaic virus91.4682070018201918109901127
TR44822|c4_g2_i3NC_002634.1Soybean mosaic virus92.754833501483199115090699
TR44822|c4_g2_i4NC_002634.1Soybean mosaic virus88.672562901256161713622.00E-84313
TR44822|c4_g2_i5NC_002634.1Soybean mosaic virus93.924615019264185316084.00E-102372
TR44822|c4_g2_i6NC_002634.1Soybean mosaic virus94.8123112019249185316239.00E-99361
TR44822|c4_g2_i7NC_002634.1Soybean mosaic virus94.154102401410191815095.00E-178625
TR44822|c5_g1_i1NC_002634.1Soybean mosaic virus95.9899440029955991698401615
TR44822|c5_g1_i2NC_002634.1Soybean mosaic virus94.1135992074235965991958805467
TR44822|c5_g2_i1NC_002634.1Soybean mosaic virus93.32241504227812483476.00E-90331
TR44822|c5_g1_i3NC_002634.1Soybean mosaic virus96.215011902502599164910821
TR44822|c5_g1_i4NC_002634.1Soybean mosaic virus92.812922102293599162821.00E-117424
TR44822|c6_g1_i1NC_002634.1Soybean mosaic virus95.071015500110156049503501598
TR44822|c6_g2_i1NC_002634.1Soybean mosaic virus97.642125010221493047196.00E-100364
TR44822|c6_g2_i2NC_002634.1Soybean mosaic virus95.832401001240505148124.00E-107388
TR44822|c6_g2_i3NC_002634.1Soybean mosaic virus97.521372340113725146377502346
TR44822|c6_g2_i4NC_002634.1Soybean mosaic virus96.592931001293514648542.00E-136486
TR44822|c6_g3_i1NC_002634.1Soybean mosaic virus95.869127256942746205701114
TR44822|c6_g3_i2NC_002634.1Soybean mosaic virus96.6987729028782822194601459
TR44822|c6_g4_i1NC_002634.1Soybean mosaic virus95.391149530111493889274101829
TR44822|c6_g4_i2NC_002634.1Soybean mosaic virus94.893331701333359832665.00E-147521
TR45256|c0_g1_i1NC_002634.1Soybean mosaic virus93.492611704264689766374.00E-107388
TR45256|c0_g1_i2NC_002634.1Soybean mosaic virus94.322291302230686566375.00E-96351
TR47685|c0_g1_i1NC_002634.1Soybean mosaic virus92.532812101281508253622.00E-111403
TR47685|c0_g2_i1NC_002634.1Soybean mosaic virus92.532812101281508253622.00E-111403
TR44246|c0_g1_i1NC_003397.1Bean common mosaic virus81.864086864908944588622.00E-91339
TR19277|c0_g2_i1NC_003617.1Lettuce infectious yellows virus RNA175.34146296467607683766941.00E-0863.9
TR45572|c0_g2_i1NC_012910.1Lettuce chlorosis virus RNA287.9619122115205855583661.00E-57224
TR29303|c0_g1_i1NC_002034.1Cucumber mosaic virus RNA191.282982604301133416311.00E-112407
  1. Ali, A, and Kobayashi, M (2010). Seed transmission of Cucumber mosaic virus in pepper. J Virol Methods. 163, 234-237.
  2. Barba, M, Czosnek, H, and Hadidi, A (2014). Historical perspective, development and applications of next-generation sequencing in plant virology. Viruses. 6, 106-136.
    Pubmed KoreaMed CrossRef
  3. Burger, JT, and Maree, HJ (2015). Metagenomic next-generation sequencing of viruses infecting grapevines. Methods Mol Biol. 1302, 315-330.
    Pubmed CrossRef
  4. Danecek, P, Auton, A, Abecasis, G, Albers, CA, Banks, E, DePristo, MA, Handsaker, RE, Lunter, G, Marth, GT, Sherry, ST, McVean, G, and Durbin, R (2011). The variant call format and vcftools. Bioinformatics. 27, 2156-2158.
    Pubmed KoreaMed CrossRef
  5. Domier, LL, Hobbs, HA, McCoppin, NK, Bowen, CR, Steinlage, TA, Chang, S, Wang, Y, and Hartman, GL (2011). Multiple loci condition seed transmission of Soybean mosaic virus (SMV) and smv-induced seed coat mottling in soybean. Phytopathology. 101, 750-756.
    Pubmed CrossRef
  6. Edwards, RA, and Rohwer, F (2005). Viral metagenomics. Nat Rev Microbiol. 3, 504-510.
    Pubmed CrossRef
  7. Eggenberger, AL, Stark, DM, and Beachy, RN (1989). The nucleotide sequence of a soybean mosaic virus coat protein-coding region and its expression in Escherichia coli, Agrobacterium tumefaciens and tobacco callus. J Gen Virol. 70, 1853-1860.
    Pubmed CrossRef
  8. Haas, BJ, Papanicolaou, A, Yassour, M, Grabherr, M, Blood, PD, Bowden, J, Couger, MB, Eccles, D, Li, B, Lieber, M, MacManes, MD, Ott, M, Orvis, J, Pochet, N, Strozzi, F, Weeks, N, Westerman, R, William, T, Dewey, CN, Henschel, R, LeDuc, RD, Friedman, N, and Regev, A (2013). De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat Protoc. 8, 1494-1512.
    Pubmed KoreaMed CrossRef
  9. Herridge, DF, Peoples, MB, and Boddey, RM (2008). Global inputs of biological nitrogen fixation in agricultural systems. Plant Soil. 311, 1-18.
  10. Hill, JH, and Whitham, SA (2014). Control of virus diseases in soybeans. Adv Virus Res. 90, 355-390.
    Pubmed CrossRef
  11. Jafarpour, B, Shepherd, R, and Grogan, R (1979). Serologic detection of bean common mosaic and lettuce mosaic viruses in seed. Phytopathology. 69, 1125-1129.
  12. Jo, Y, Choi, H, Yoon, J-Y, Choi, S-K, and Cho, WK (2016). In silico identification of Bell pepper endornavirus from pepper transcriptomes and their phylogenetic and recombination analyses. Gene. 575, 712-717.
  13. Leinonen, R, Sugawara, H, and Shumway, M (2011). The sequence read archive. Nucleic Acids Res. 39, D19-D21.
    KoreaMed CrossRef
  14. Li, H, and Durbin, R (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754-1760.
    Pubmed KoreaMed CrossRef
  15. Li, H, Handsaker, B, Wysoker, A, Fennell, T, Ruan, J, Homer, N, Marth, G, Abecasis, G, and Durbin, R (2009). The sequence alignment/map format and samtools. Bioinformatics. 25, 2078-2079.
    Pubmed KoreaMed CrossRef
  16. Madden, T (2013). The BLAST sequence analysis tool. The NCBI handbook. Bethesda, MD, USA: National Center for Biotechnology Information
  17. Massart, S, Olmos, A, Jijakli, H, and Candresse, T (2014). Current impact and future directions of high throughput sequencing in plant virus diagnostics. Virus Res. 188, 90-96.
    Pubmed CrossRef
  18. Messina, MJ (1999). Legumes and soybeans: overview of their nutritional profiles and health effects. Am J Clin Nutr. 70, 439S-450S.
  19. Metzker, ML (2010). Sequencing technologies - the next generation. Nat Rev Genet. 11, 31-46.
  20. Milne, I, Bayer, M, Cardle, L, Shaw, P, Stephen, G, Wright, F, and Marshall, D (2010). Tablet-next generation sequence assembly visualization. Bioinformatics. 26, 401-402.
  21. Morales, FJ, and Castano, M (1987). Seed transmission characteristics of selected bean common mosaic virus strains in differential bean cultivars. Plant Dis. 71, 51-53.
  22. Morozova, O, and Marra, MA (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics. 92, 255-264.
    Pubmed CrossRef
  23. Pimentel, D, and Patzek, TW (2005). Ethanol production using corn, switchgrass, and wood; biodiesel production using soybean and sunflower. Nat Resour Res. 14, 65-76.
  24. Ruiz, M, Simón, A, García, M, and Janssen, D (2014). First report of Lettuce chlorosis virus infecting bean in spain. Plant Dis. 98, 1.
  25. Song, Q-X, Li, Q-T, Liu, Y-F, Zhang, F-X, Ma, B, Zhang, W-K, Man, W-Q, Du, W-G, Wang, G-D, Chen, S-Y, and Zhang, JS (2013). Soybean GmbZIP123 gene enhances lipid content in the seeds of transgenic Arabidopsis plants. J Exp Bot. 64, 4329-4341.
    Pubmed KoreaMed CrossRef
  26. Tamura, K, Stecher, G, Peterson, D, Filipski, A, and Kumar, S (2013). Mega6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 30, 2725-2729.
    Pubmed KoreaMed CrossRef
  27. Vodovar, N, Goic, B, Blanc, H, and Saleh, M-C (2011). In silico reconstruction of viral genomes from small rnas improves virus-derived small interfering rna profiling. J Virol. 85, 11016-11021.
    Pubmed KoreaMed CrossRef
  28. Wang, Y, and Qian, P-Y (2009). Conservative fragments in bacterial 16s rRNA genes and primer design for 16s ribosomal DNA amplicons in metagenomic studies. PLoS One. 4, e7401.
    Pubmed KoreaMed CrossRef
  29. Wrather, J, Anderson, T, Arsyad, D, Tan, Y, Ploper, L, Porta-Puglia, A, Ram, H, and Yorinori, J (2001). Soybean disease loss estimates for the top ten soybean-producing counries in 1998. Can J Plant Pathol. 23, 115-121.
  30. Wylie, S, Wilson, C, Jones, R, and Jones, M (1993). A polymerase chain reaction assay for cucumber mosaic virus in lupin seeds. Aust J Agr Res. 44, 41-51.
  31. Yanagisawa, H, Tomita, R, Katsu, K, Uehara, T, Atsumi, G, Tateda, C, Kobayashi, K, and Sekine, K-T (2016). Combined DECS analysis and next-generation sequencing enable efficient detection of novel plant RNA viruses. Viruses. 8, 70.
    Pubmed KoreaMed CrossRef
  32. Yang, Y, Kim, KS, and Anderson, EJ (1997). Seed transmission of cucumber mosaic virus in spinach. Phytopathology. 87, 924-931.