Você está na página 1de 9

Gene 475 (2011) 104112

Contents lists available at ScienceDirect

Gene
j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / g e n e

Characterization of the complete chloroplast genome of Hevea brasiliensis reveals genome rearrangement, RNA editing sites and phylogenetic relationships
Sithichoke Tangphatsornruang , Pichahpuk Uthaipaisanwong, Duangjai Sangsrakru, Juntima Chanprasert, Thippawan Yoocha, Nukoon Jomchai, Somvong Tragoonrung
National Center for Genetic Engineering and Biotechnology, 113 Phaholyothin Rd., Klong 1, Klong Luang, Pathumthani, 12120, Thailand

a r t i c l e

i n f o

a b s t r a c t
Rubber tree (Hevea brasiliensis) is an economical plant and widely grown for natural rubber production. However, genomic research of rubber tree has lagged behind other species in the Euphorbiaceae family. We report the complete chloroplast genome sequence of rubber tree as being 161,191 bp in length including a pair of inverted repeats of 26,810 bp separated by a small single copy region of 18,362 bp and a large single copy region of 89,209 bp. The chloroplast genome contains 112 unique genes, 16 of which are duplicated in the inverted repeat. Of the 112 unique genes, 78 are predicted protein-coding genes, 4 are ribosomal RNA genes and 30 are tRNA genes. Relative to other plant chloroplast genomes, we observed a unique rearrangement in the rubber tree chloroplast genome: a 30-kb inversion between the trnE(UUC)-trnS(GCU) and the trnT(GGU)-trnR(UCU). A comparison between the rubber tree chloroplast genes and cDNA sequences revealed 51 RNA editing sites in which most (48 sites) were located in 26 protein coding genes and the other 3 sites were in introns. Phylogenetic analysis based on chloroplast genes demonstrated a close relationship between Hevea and Manihot in Euphorbiaceae and provided a strong support for a monophyletic group of the eurosid I. 2011 Elsevier B.V. All rights reserved.

Article history: Accepted 5 January 2011 Available online 15 January 2011 Received by Jean-Marc Deragon Keywords: Hevea brasiliensis Chloroplast/plastid genome RNA editing

1. Introduction Chloroplasts are plant organelles with their own genome containing genes coding for transcription, translation machinery and components of the photosynthetic complex. Since the rst complete chloroplast (cp) genome sequence of liverwort (Marchantia polymorpha) reported in 1986 (Ohyama et al., 1986), more than 150 chloroplast genomes have been sequenced and characterized; disclosing an enormous amount of evolutionary and functional information of chloroplasts. Chloroplast genomes are sufciently large and complex to include structural and point mutations that are useful for evolutionary studies from intraspecic to interspecic levels (Neale et al., 1988; McCauley, 1992; Graham and Olmstead, 2000; Provan et al., 2001). Structural mutations such as gene duplications of tRNA genes (Hipkins et al., 1995), rpl19, rpl2, rpl23 (Bowman et al., 1988), psbA (Lidholm et al., 1991); losses of ndh genes (Wakasugi et al., 1994), hypothetical chloroplast open reading frame (ycf) genes, infA, and accD (Hiratsuka et al., 1989; Maier et al., 1995;
Abbreviations: bp, base pair; cp, chloroplast; IDP, Isopentenyl diphosphate; MVA, Mevalonate; MEP, 1-Deoxy-D-xylulose 5-phosphate/2-C-methyl-D-erythritol 4-phosphate; H. brasiliensis, Hevea brasiliensis; PCR, Polymerase chain reaction; RCA, Rolling cycle amplication; ML, Maximum likelihood; MP, Maximum parsimony; TBR, Tree bisection and reconnection; A, Adenosine; C, Cytidine; I, Inosine; U, Uridine; G, Guanosine; LSC, Large single copy; SSC, Small single copy; IR, Inverted repeat; ycf, Hypothetical chloroplast open reading frame; EST, Expressed sequence tag. Corresponding author. Tel.: +66 2 564 6700x3259; fax: +66 2 564 6584. E-mail address: sithichoke.tan@biotec.or.th (S. Tangphatsornruang). 0378-1119/$ see front matter 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2011.01.002

Millen et al., 2001); as well as rearrangements of cp genomes (Palmer et al., 1987; Wolfe et al., 1991; Wojciechowski et al., 2004; Guo et al., 2007; Tangphatsornruang et al., 2010b) have been reported in plants and algae. Therefore, chloroplast genome sequences have been used to study phylogenetic relationships (Provan et al., 2001; Lee et al., 2006; Tangphatsornruang et al., 2010b), test hypotheses of seed dispersal, intraspecic differentiation and interspecic introgression (Petit et al., 2003, 2005). In chloroplasts, transcripts undergo a series of RNA processing steps such as inton splicing, polycistronic cleavage, and RNA editing. RNA editing is a mechanism to change genetic information at the transcript level by nucleotide insertion, deletion or conversion (Bock, 2000; Knoop, 2010). Since the rst report of RNA editing in chloroplast in the maize rpl2 gene (Hoch et al., 1991), several editing sites have been reported in Arabidopsis thaliana (Tillich et al., 2005), Atropa belladonna (Schmitz-Linneweber et al., 2002), Lotus japonicus (Kato et al., 2000), black pine (Wakasugi et al., 1996), cassava (Daniell et al., 2008), pea (Miyamoto et al., 2002), tobacco (Sasaki et al., 2003), maize (Maier et al., 1995; Halter et al., 2004) and rice (Corneille et al., 2000). Comparison of sequences surrounding the editing sites revealed no consensus sequence or secondary structure (Hirose et al., 1999). This raised a question of how RNA editing sites are recognized. Previous studies suggested the involvement of distinct cis-acting elements and trans-acting factors in recognition of an individual editing site (Chaudhuri et al., 1995; Bock et al., 1996; Chaudhuri and Maliga, 1996; Hirose and Sugiura, 2001; Miyamoto et al.,

S. Tangphatsornruang et al. / Gene 475 (2011) 104112

105

2002; Lurin et al., 2004; Kotera et al., 2005; Hayes and Hanson, 2007). The RNA-binding pentatricopeptide repeat (PPR) proteins were identied as trans-acting factor responsible for targeting specic editing events (Lurin et al., 2004; Kotera et al., 2005; Hammani et al., 2009). Hevea brasiliensis is a perennial plant in the Euphorbiaceae family and is the most widely cultivated species for commercial production of natural rubber. The chemical composition of natural rubber is cispolyisoprene, a high-molecular weight polymer formed from sequential condensation of isopentenyl diphosphate (IDP) units catalysed by the action of rubber transferase (Cornish, 2001a). IDP is also an important intermediate for biosynthesis of essential oils, abscisic acid, cytokinin, phytoalexin, sterols, chlorophyll, carotenoids and gibberellins (Chappell, 1995a; McGarvey and Croteau, 1995; Lichtenthaler et al., 1997; Cornish, 2001b). There are two IDP biosynthesis pathways: the mevalonate (MVA) pathway which occurs in cytosol (Chappell, 1995b); and the 1-deoxy-D-xylulose 5-phosphate/2-C-methyl-Derythritol 4-phosphate (MEP) pathway which occurs in plastids (Lichtenthaler, 1999; Ko et al., 2003). One approach to improving rubber production in H. brasiliensis would be to engineer chloroplasts and modify metabolic ux to produce more biosynthetic intermediates. The availability of the complete chloroplast genome sequence should also facilitate the chloroplast transformation technique. The improved transformation efciency and foreign gene expression can be achieved through utilization of endogenous anking sequences and regulatory elements (Birch-Machin et al., 2004; Maliga, 2004; Tangphatsornruang et al., 2010a). Transformation of chloroplast genome offers a number of advantages over nuclear transformation including a high level of transgene expression, polycistronic transcription, lack of gene silencing or positional effect and transgene containment (Daniell et al., 2002; Maliga, 2002, 2004; Bock, 2007). We sequenced the chloroplast genome of H. brasiliensis in order to gain information for genome annotation, comparative genomic studies and also to lay the groundwork for chloroplast engineering. We employed the massively-parallel pyrosequencing technology developed by 454 Life Sciences Technology (Margulies et al., 2005). This technology has been applied to the sequencing of genomes, transcriptome proling and methylation studies. Previous work demonstrated the success of high throughput sequencing technology in obtaining chloroplast genome sequences (Cai et al., 2006; Moore et al., 2006; Cronn et al., 2008; Tangphatsornruang et al., 2010b). This overcomes the traditional labor-intensive methods involving isolation of chloroplast DNA followed by random shearing and cloning into vectors; or long PCR amplication by conserved primers (Goremykin et al., 2003, 2004; Dhingra and Folta, 2005; Heinze, 2007), or rolling circle amplication (RCA) (Jansen et al., 2005; Bausher et al., 2006). In this study, we determined the complete nucleotide sequence of the H. brasiliensis chloroplast genome, annotated it, compared the structures with other plant species, identied RNA editing sites, and used the rubber tree chloroplast genome to determine phylogenetic relationships among angiosperms. 2. Materials and methods 2.1. DNA sequencing, assembly and annotation DNA was isolated from 1 g of leaves of H. brasiliensis, clone RRIM600, using the DNeasy Plant Mini Kit (Qiagen). The DNA (10 g) was sheared by nebulization, subjected to 454 library preparation and shotgun sequencing using the GS FLX Titanium platform (Margulies et al., 2005) at the in-house facility (National Center for Genetic Engineering and Biotechnology, Thailand). The obtained nucleotide sequence reads were assembled using Newbler de novo sequence assembly software (Roche). The chloroplast genome sequence was compared with the reference sequence from the complete chloroplast genome of Manihot esculenta (Daniell et al., 2008) using a Sequencher software (Gene Codes Corporation). Remaining gaps were closed by PCR and Sanger sequencing using BigDye Terminator v3.1 Cycle sequencing kit. The 3

primer pairs used for closing the gaps are 1) gap_LSCF: 5-GGG CTC TAA AAA GAC ATC TCC A-3, gap_LSCR: 5-CTT TCT GTC TTT CAC GAT TCC A-3, 2) gap_SSC1F: 5-TGT ATG ACC ATC GAG GAA CTT G-3, gap_SSC1R: 5-GTC GGA GTG ATG GAA AAG AAA G-3 and 3) gap_SSC2F: 5-GCT GAA TAG ACA AAT CGA TTG AA-3, gap_SSC2R: 5-TGA TCC ATT TTC TAG CCC AAG-3. PCR products were puried by electrophoresis in agarose gel using Qiaquick Gel Extraction Kit (QIAGEN). 2.2. Genome analysis The genome was annotated using the program DOGMA (Dual Organellar GenoMe Annotator (Wyman et al., 2004)). The predicted annotations were veried using BLAST similarity search (Altschul et al., 1990). All genes, rRNAs, and tRNAs were identied using the plastid/ bacterial genetic code. The chloroplast genome of H. brasiliensis was compared with chloroplast genomes of Arabidopsis (Sato et al., 1999), Populus, Jatropha and Manihot (Daniell et al., 2008) using a Mauve software (Darling et al., 2004). REPuter (Kurtz and Schleiermacher, 1999) was used to identify and locate direct repeat and inverted repeat sequences in the rubber tree chloroplast genome with criteria cutoff n 30 bp, and a sequence identity 90%. 2.3. RNA editing To reveal RNA editing sites, more than two million cDNA sequences of rubber trees were downloaded from the DDBJ read archive (ID = DRA000170) and used to align with the protein coding genes extracted from the rubber tree chloroplast genome using GS Reference Mapper version 2.3 (Roche). Some RNA editing sites (rps2eU134TI, rps14eU149PL, ndhKeU65SL, petBi178, ndhBeC1290YY, ndhBeU467PL, ndhDeU887PL, ndhDeU878SL and ndhDeU599SL) were conrmed by sequencing of cDNA products by Sanger sequencing. In brief, total RNA was extracted from 0.5 g of young leaf using Concert Plant RNA Reagent (Invitrogen), treated with DNA-free DNaseI (Ambion) and converted to a pool of cDNA using RevertAid H minus First Strand cDNA synthesis kit (Fermentas). Primer sequences were given in Supplementary Fig. 2. 2.4. Phylogenetic analysis A set of 33 protein-coding genes including atpA, atpB, atpE, atpF, atpH, atpI, ccsA, cemA, matK, petA, petG, petN, psaA, psaB, psaC, psbC, psbD, psbE, psbF, psbI, psbJ, psbK, psbN, psbZ, rbcL, rpl2, rpl20, rpoB, rpoC2, rps4, rps14, rps15 and ycf3 from 39 chloroplast genomes representing all lineages of angiosperms, were analyzed. These 33 genes are commonly present in all 39 chloroplast genomes and publicly available in the GenBank database. Sequences were aligned using MUSCLE (version 3.6) (Edgar, 2004) and edited manually. For maximum likelihood (ML) analysis, RAxML version 7.0 (Stamatakis, 2006) was used with the GTR + I + G matrix. The local bootstrap probability of each branch was calculated by 100 replications. Phylogenetic analyses using maximum parsimony (MP) were performed using PAUP version 4.0b10 (Swofford, 2002). MP searches included 1000 random addition replicates and a heuristic search using tree bisection and reconnection (TBR) branch swapping with the Multrees option. Bootstrap analysis was performed with 100 replicates with TBR branch swapping. TreeView (Page, 1996) was used for displaying and printing phylogenetic trees. 3. Results and discussion 3.1. Sequencing and assembly of the H. brasiliensis chloroplast genome A total of 995,092 quality ltered sequence reads was generated with the average read length of 332 bases covering 330 Mb. From the assembly analysis, 3 contigs, assembled from 60,855 reads (5.49%),

106

S. Tangphatsornruang et al. / Gene 475 (2011) 104112

were shown to be parts of the chloroplast genome by alignment with the M. esculenta chloroplast genome. The proportion of sequences from the chloroplast genome in rubber tree (5.49%) is similar to a previous study in mungbean (5.22%) (Tangphatsornruang et al., 2010b). The gaps between contigs were located in the large single copy (LSC; between trnS-GCU and trnE-UUC) with the size of 84 bp, in the small single copy (SSC; between ndhF and trnL-UAG) with the size of 1074 bp and at the junction between SSC-IRa with the size of 299 bp. A common characteristic of these gaps is the presence of multiple copies of high AT repeats as also found by Tangphatsornruang et al., 2010b. Closing of the gaps with Sanger sequencing resulted in a complete chloroplast genome sequence. Since 454 sequencing technology has a limitation in reading long homopolymer regions (Moore et al., 2006; Huse et al., 2007; Tangphatsornruang et al., 2010b), we performed Sanger sequencing of all homopolymers (N 7 bp) present in the chloroplast genome (Supplementary Table 1). Throughout the rubber tree chloroplast

genome, there are 229 homopolymers (N 7 bp); 45 homopolymers are present in 18 coding genes and 184 are present in non-coding regions. Among the protein coding sequences, ycf1 contains the highest number of homopolymers (21) and followed by ycf2 (4). The longest stretch of homopolymer is 19 bp located in the intergenic region between atpF and atpA. Out of 229 homopolymers, 221 were polyA/T and only 8 were polyG/C. We observed that the number of corrected homopolymeric bases from GS FLX Titanium in this study were 258 out of 2227 (11.58%) which were 4 times higher than the previous report on errors in homopolymers by the previous version of the GS FLX platform (Tangphatsornruang et al., 2010b). The complete chloroplast genome sequence was reported in the NCBI database (HQ285842). The chloroplast genome contains a pair of identical inverted repeat regions (IRA and IRB), which are 26,810 bp each. The inverted repeats are separated by a large single-copy (LSC) region of 89,209 bp and a small single-copy (SSC) region of 18,362 bp.

Fig. 1. Map of the H. brasiliensis chloroplast genome. The thick lines indicate the extent of the inverted repeats (IRa and IRb) which separate the genome into small and large single copy regions. Genes on the outside of the map are transcribed clockwise and those on the inside of the map are transcribed counter clockwise. Genes containing introns and psuedogenes are marked with * and # respectively. Arrows indicate the positions of a 30-kb unique rearrangement in relative to the cassava chloroplast genome.

S. Tangphatsornruang et al. / Gene 475 (2011) 104112

107

3.2. Genome content and organization The positions of all the genes identied in the H. brasiliensis chloroplast genome and functional categorization of these genes are presented in Fig. 1. The genome contains 112 unique genes including 30 tRNA genes, 4 rRNA genes and 78 predicted protein coding genes (Table 1). In addition, there are 16 genes duplicated in the inverted repeat (IR), making a total of 128 genes present in the rubber tree chloroplast genome. Coding regions (90,532 bp; 56.16%) account for over half of the chloroplast genome, with the peptide-coding regions forming the largest group (78,681 bp; 48.81%) followed by ribosomal RNA genes (9050 bp; 5.61%) and transfer RNA genes (2801 bp; 1.74%). The remaining 43.84% is covered by intergenic regions (29.61%) and a total of 23 introns (13.27%) present within 22 genes (or 17 unique genes). The trnK-UUU gene has the largest intron (2535 bp) in which the matK gene is present. There are unique 30 tRNA genes (7 tRNA genes are duplicated in the IR) which recognize all RNA codons for 20 amino acids according to the wobble rubles. Based on the sequences of protein-coding genes and tRNA genes within the chloroplast genome, we were able to deduce the frequency of codon usage as summarized in Supplementary Table 3. We observed that the codon usage was biased towards a high representation of A and U at the third codon position like in all other land plants (Shimada and Sugiura, 1991; Cai et al., 2006; Gao et al., 2009). The rubber tree psbC and rps19 genes contain GUG as a start codon. Sequence alignment between the chloroplast genome and the rubber tree ESTs also conrmed that both psbC and rps19 transcripts have GUG as the start codons. Studies of psbC and rps19 translation also revealed that GUG codon is the initiation codon in several plants and algae (Rochaix et al., 1989; Carpenter et al., 1990; Yukawa et al., 2005; Kuroda et al., 2007). The previously sequenced chloroplast genomes of Malpighiales (Manihot, Jatropha and Populus) and Hevea as reported here were compared with the Arabidopsis chloroplast genome as the reference sequence (Darling et al., 2004) (Supplementary Fig. 1). We observed a unique genome rearrangement of a 30 kb fragment in the LSC between trnS(GCU)-trnE(UUC) and trnR(UCU)-trnT(GGU) in the rubber tree chloroplast genome compared with others. Although, we were unable to identify any signicant repeats in spaces between the rubber tree trnT(GGU)-trnR(UCU) and trnE(UUC)-trnS(GCU), these regions are biased towards high AT content, 85.54% and 84.92%, respectively.

Analysis of the repeat sequences in the rubber tree chloroplast genome identied twenty ve direct repeats and seventeen inverted repeats of 30 bp or longer with a sequence identity of 90% (Supplementary Table 4). Eighteen repeats are 30 to 40 bp long, eleven repeats are 4150 bp long, seven repeats are 5180 bp long, and six repeats are longer than 80 bp. The longest repeat in rubber tree chloroplast DNA is a 151-bp direct repeat between the trnG-GCC and trnT-GGU. Most of the direct repeats are distributed within the intergenic spacer regions, the intron sequences, and in the tRNA, and ycf2 genes. Two ycf genes (ycf15 and ycf68) are probably not functional in the rubber tree chloroplast genome due to the presence of premature stop codons. In several chloroplast genomes, ycf15 and ycf68 have also been reported as non-functional genes (Sato et al., 1999; SchmitzLinneweber et al., 2001; Steane, 2005; Raubeson et al., 2007; Daniell et al., 2008). The infA gene is present but probably non-functional in the rubber tree chloroplast genome due to the presence of a
Table 2 RNA editing events in the rubber tree chloroplast genes. The annotation nomenclature of RNA editing events is according to Lenz et al., 2009. Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 RNA editing sites matKeU1168RW matKeU634HY matKeU149SF rps16i493 rpoBeU551SL rpoC1eU41SL rpoC2eU3746SL rps2eU134TI rps2eU248SL atpIeU635SL psbDeU435II rps14eU149PL rps14eU80SL ndhKeU65SL ndhCeU323SL psbEeU214PS petLeU5PL rps18eU221SL clpPeU556HY psbBeU414II petBi178 petBeU611SL petDeU481SL rpoAeU836SL rpoAeU200SF rpl23eU89SL ycf2eU467PL ycf2eC1608VV ycf2eA1645VI ndhBeU1481PL ndhBeC1290YY ndhBeU1255HY ndhBeU59SL ndhBeU830SL ndhBeU746SF ndhBeU611SL ndhBeU586HY ndhBeU542TM ndhBeU467PL ndhBeU149SL rps12-3endi186 ndhDeU887PL ndhDeU878SL ndhDeU674SL ndhDeU599SL ndhDeU313RW ndhEeU233PL ndhGeU347PL ndhAeU961PS ndhAeU566SL ndhHeU505HY

Table 1 Genes encoded by the Hevea brasiliensis chloroplast genome. 1. Photosystem I: psaA, psaB, psaC, psaI, psaJ, ycf3a, ycf4 2. Photosystem II: psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ 3. Cytochrome b6/f: petA, petBb, petDb, petG, petL, petN 4. ATP synthase: atpA, atpB, atpE, atpF, atpH, atpI 5. Rubisco: rbcL 6. NADH oxidoreductase: ndhAb, ndhBb,c, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK 7. Large subunit ribosomal proteins: rpl2b,c, rpl14, rpl16b, rpl20, rpl22, rpl23c, rpl32, rpl33, rpl36 8. Small subunit ribosomal proteins: rps2, rps3, rps4, rps7c, rps8, rps11, rps12b,c,d, rps14, rps15, rps16b, rps18, rps19 9. RNAP: rpoA, rpoB, rpoC1b, rpoC2 10. Other proteins: accD, ccsA, cemA, clpPa, matK 11. Proteins of unknown function: ycf1, ycf2c 12. Ribosomal RNAs: rrn16c, rrn23c, rrn4.5c, rrn5c 13. Transfer RNAs: A(UGC)b,c, C(GCA), D(GUC), E(UUC), F(GAA), G(GCC)b, G(UCC), H(GUG), I(CAU)c, I(GAU)b,c, K(UUU)b, L(CAA)c, L(UAA)b, L(UAG), fM(CAU), M(CAU), N(GUU)c, P(UGG), Q(UUG), R(ACG)c, R(UCU), S(GCU), S(GGA), S(UGA), T(GGU), T(UGU), V(GAC)c, V(UAC)b, W(CCA), Y(GUA)
a b c d

Gene containing two introns. Gene containing a single intron. Two gene copies in the IRs. Gene divided into two independent transcription units.

108

S. Tangphatsornruang et al. / Gene 475 (2011) 104112

premature stop codon in both chloroplast DNA and cDNA sequences. The loss of infA from the chloroplast genome has been reported to occur multiple times during the angiosperm evolution (Millen et al., 2001). The infA gene has been lost from the M. esculenta chloroplast genome, the closest fully sequenced relative to H. brasiliensis; but it is present in Populus, another plant species in the Malpighiales order (Millen et al., 2001; Daniell et al., 2008). 3.3. RNA editing search by comparison between coding sequences and cDNAs To determine the RNA editing sites, we compared the protein coding sequences extracted from the rubber tree chloroplast genome with 2,265,782 rubber tree cDNA sequences downloaded from the DDBJ read archive (ID = DRA000170). The chloroplast gene sequences matched 52,971 out of 2.2 million rubber tree ESTs (2.23%). There were 6765 EST reads (2,059,201 bp) mapped to chloroplast protein coding genes which is equivalent to 23 coverage of the chloroplast coding region. Table 2 presents 51 RNA editing sites identied and named according to the proposed universal nomenclature by Lenz et al., 2009 (Lenz et al., 2009). Forty eight were in protein coding regions of 26 protein coding genes, 3 were in introns of rps16, petB and rps12. Out of 48 RNA editings in mRNA, a C-to-U change was the most common (45), followed by a U-to-C change (2) and a G-to-A change (1). In chloroplasts and mitochondria of seed plants, a conversion from C to U is the most predominant form (Bock, 2000). The reverse U-to-C editing is rarely observed in seed plants (Gualberto et al., 1990;
At.genomic At.cDNA Sl.genomic Sl.cDNA Nt.genomic Nt.cDNA Hb.genomic Hb.cDNA

Schuster et al., 1990); but it is common in hornworts and ferns (Yoshinaga et al., 1996; Steinhauser et al., 1999; Vangerow et al., 1999). The two U-to-C events of RNA editing were found only in ndhB and ycf2 transcripts which are very close to each other (7266 bp apart). It is also possible that this fragment of the chloroplast genome may be transferred to a mitochondrial genome where extensive RNA editing events occur. Several lines of evidence have suggested translocation of chloroplast DNA fragments to mitochondrial genomes in many plant species (Stern and Lonsdale, 1982; Stern and Palmer, 1984; Moon et al., 1987). Further experiments on cDNA sequencing of transcripts extracted from isolated chloroplasts will be required to test this hypothesis. An uncommon G-to-A change at the ycf2eA1645VI observed here has never been reported in chloroplasts of higher land plants before. Although, an A-to-I/G editing has been commonly observed in tRNAs to expand the ability to read additional codons (Ptzinger et al., 1990; Dao et al., 1994; Agris et al., 2007). Recently, the adenosine deaminase gene acting on tRNAs (ADAT) responsible for the editing of the adenosine at the wobble position of cp-tRNAArg(ACG) has been identied in Arabidopsis chloroplasts (Delannoy et al., 2009; Karcher and Bock, 2009). However, it should be noted that the number of edited mRNA found in the ycf2 transcript (from G to A) was only two compared with six unedtited ycf2 transcripts, and this G-to-A conversion may be due to sequencing error which may overestimate the number of RNA editing events in this study. There are 45 non-synonymous substitutions which are present most frequently in ndhB (11) and followed by ndhD (5). The ndhB

MIWHVQNENFILDSTRIFMKAFHLLLFDGSFIFPECILIFGLILLLMIDSTSDQKDIPWLYFISSTSFVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE MIWHVQNENFILDSTRIFMKAFHLLLFDGSFIFPECILIFGLILLLMIDLTSDQKDIPWLYFISSTSFVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE -------------------------------------------------*-----------------------------------------------------------------------------MIWHVQNENFILDSTRIFMKAFHLLLFDGSLIFPECILIFGLILLLMIDSTSDQKDIPWLYFISSTSLVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE MIWHVQNENFILDSTRIFMKAFHLLLFDGSLIFPECILIFGLILLLMIDLTSDQKDIPWLYFISSTSLVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE -------------------------------------------------*-----------------------------------------------------------------------------MIWHVQNENFILDSTRIFMKAFHLLLFDGSLIFPECILIFGLILLLMIDSTSDQKDIPWLYFISSTSLVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE MIWHVQNENFILDSTRIFMKAFHLLLFDGSLIFPECILIFGLILLLMIDLTSDQKDIPWLYFISSTSLVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE -------------------------------------------------*-----------------------------------------------------------------------------MIWHVQNENFILDSTRIFMKAFHLLLFDGSFIFPECILIFGLILLLMIDSTSDQKDIPWLYFISSTSLVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE MIWHVQNENFILDSTRIFMKAFHLLLFDGSFIFPECILIFGLILLLMIDLTSDQKDIPWLYFISSTSLVMSITALLFRWREEPMISFSGNFQTNNFNEIFQFLILLCSTLCIPLSVEYIECTEMAITE -------------------------------------------------*-----------------------------------------------------------------------------FLLFILTATLGGMFLCGANDLITIFVAPECFSLCSYLLSGYTKKDIRSNEATMKYLLMGGASSSILVHGFSWLYGSSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSLAPSHQWTPDV GG CG C S CS SG S GG SSS G S GSSGG Q G Q S G S G G S S Q FLLFILTATLGGMFLCGANDLITIFVALECFSLCSYLLSGYTKKDIRSNEATMKYLLMGGASSSILVYGFSWLYGSSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSLAPFHQWTPDV ---------------------------*---------------------------------------*----------------------------------------------------*------FLLFVLTATLGGMFLCGANDLITIFVAPECFSLCSYLLSGYTKKDVRSNEATMKYLLMGGASSSILVHGFSWLYGSSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSPAPSHQWTPDV FLLFVLTATLGGMFLCGANDLITIFVALECFSLCSYLLSGYTKKDVRSNEATMKYLLMGGASSSILVYGFSWLYGLSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSLAPFHQWTPDV ---------------------------*---------------------------------------*-------*-----------------------------------------*--*------FLLFVLTATLGGMFLCGANDLITIFVAPECFSLCSYLLSGYTKKDVRSNEATMKYLLMGGASSSILVHGFSWLYGSSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSPAPSHQWTPDV FLLFVLTATLGGMFLCGANDLITIFVALECFSLCSYLLSGYTKKDVRSNEATMKYLLMGGASSSILVYGFSWLYGLSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSLAPFHQWTPDV ---------------------------*---------------------------------------*-------*-----------------------------------------*--*------* * * * * FLLFVLTATLGGMFLCGANDLITIFVAPECFSLCSYLLSGYTKKDVRSNEATTKYLLMGGASSSILVHAFSWLYGSSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSLAPSHQWTPDV FLLFVLTATLGGMFLCGANDLITIFVALECFSLCSYLLSGYTKKDVRSNEATMKYLLMGGASSSILVYAFSWLYGLSGGEIELQEIVNGLINTQMYNSPGISIALIFITVGIGFKLSLAPFHQWTPDV ---------------------------*------------------------*--------------*-------*--------------------------------------------*------YEGSPTPVVAFLSVTSKVAASASATRIFDIPFYFSSNEWHLLLEILAILSMIFGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNGGYASMITYMLFYIAMNLGTFACIILFGLRTGTDNIRDY YEGSPTPVVAFLSVTSKVAALALATRIFDIPFYFSSNEWHLLLEILAILSMIFGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNGGYASMITYMLFYIAMNLGTFACIILFGLRTGTDNIRDY --------------------*-*--------------------------------------------------------------------------------------------------------YEGSPTPVVAFLSVTSKVAASASATRIFNIPFYFSSNEWHLLLEILAILSMILGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNDGYASMITYMLFYISMNLGTFACIVLFGLRTGTDNIRDY YEGSPTPVVAFLSVTSKVAALALATRIFNIPFYFSSNEWHLLLEILAILSMILGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNDGYASMITYMLFYISMNLGTFACIVLFGLRTGTDNIRDY --------------------*-*--------------------------------------------------------------------------------------------------------YEGSPTPVVAFLSVTSKVAASASATRIFDIPFYFSSNEWHLLLEILAILSMILGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNDGYASMITYMLFYISMNLGTFACIVLFGLRTGTDNIRDY YEGSPTPVVAFLSVTSKVAALALATRIFDIPFYFSSNEWHLLLEILAILSMILGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNDGYASMITYMLFYISMNLGTFACIVLFGLRTGTDNIRDY --------------------*-*--------------------------------------------------------------------------------------------------------YEGSPTPVVAFLSVTSKVAASASATRIFDIPFYFSSNEWHLLLEILAILSMIVGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNGGYASMITYMLFYISMNLGTFACIVLFGLRTGTDNIRDY YEGSPTPVVAFLSVTSKVAALALATRIFDIPFYFSSNEWHLLLEILAILSMIVGNLIAITQTSMKRMLAYSSIGQIGYVIIGIIVGDSNGGYASMITYMLFYISMNLGTFACIVLFGLRTGTDNIRDY --------------------*-*--------------------------------------------------------------------------------------------------------* * AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLHLFWCGWQAGLYFLVSIGLLTSVLSIYYYLKIIKLLMTGRNQEITPHMRNYRISPLRSNNSIELSMIVCVIASTIPGISMNPIIAIAQDTLFSF AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVSIGLLTSVLSIYYYLKIIKLLMTGRNQEITPHMRNYRISPLRSNNSIELSMIVCVIASTILGISMNPIIAIAQDTLFSF ----------------------------------*--------------------------------------------------------------------------*-----------------AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTIPGISMNPIIAIAQDSLF-AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTILGISMNPIIAIAQDSLF--------------------------------------------------------------------------------------------------------------*---------------AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTIPGISMNPIIAIAQDSLF AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTIPGISMNPIIAIAQDSLF-AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTILGISMNPIIAIAQDSLF--------------------------------------------------------------------------------------------------------------*---------------AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLHLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTIPGISMNPIVEIAQDTLF-AGLYTKDPFLALSLALCLLSLGGLPPLAGFFGKLYLFWCGWQAGLYFLVLIGLLTSVVSIYYYLKIIKLLMTGRNQEITPHVRNYRRSPLRSNNSIELSMIVCVIASTILGISMNPIVEIAQDTLF-----------------------------------*----------*---------------------------------------------------------------*----------------

At.genomic i At.cDNA Sl.genomic Sl.cDNA Nt.genomic Nt.cDNA Hb.genomic Hb.cDNA

At.genomic At.cDNA Sl Sl.genomic i Sl.cDNA Nt.genomic Nt.cDNA Hb.genomic Hb.cDNA

At.genomic At.cDNA Sl.genomic Sl.cDNA Nt genomic Nt.genomic Nt.cDNA Hb.genomic Hb.cDNA

Fig. 2. Sequence alignment of ndhB proteins translated from chloroplast genomes before RNA editing and cDNAs after RNA editing of Arabidopsis thaliana (At), Solanum lycopersicum (Sl), Nicotiana tabaccum (Nt) and Hevea brasiliensis (Hb) using CLUSTAL 2.0.12. Stars represent RNA editing sites and hyphens represent unedited sites.

S. Tangphatsornruang et al. / Gene 475 (2011) 104112

109

transcripts were also found to be highly edited in other plants such as maize, sugarcane, rice, barley, tomato, tobacco and Arabidopsis (Freyer et al., 1995; Kahlau et al., 2006; Chateigner-Boutin and Small, 2007). Fig. 2 shows amino acid sequence alignment of the ndhB proteins from Arabidopsis, tobacco, tomato and rubber tree with RNA editing positions. All RNA editing events in the highly edited ndhB transcripts maintained the conserved ndhB amino acid sequences in all 4 plant species. We also observed that 40 RNA editing events in rubber tree chloroplasts caused amino acid changes for highly hydrophobic residues (such as L, F, I, M, V and W) with conversions from serine to leucine as the most frequent transitions. The majority of RNA editing in messenger RNAs occurred at the second codon position (36), followed by the rst codon position (10) and the third codon position (2). In RNA editing events at the second codon, there was a

bias toward pyrimidine nucleotide at the 5 upstream and purine nucleotide at the 3 downstream. However, it is unclear whether these biases are due to evolutionary or mechanism limitation of the editing process. 3.4. Phylogenetic analysis Our phylogenetic data set included 33 protein coding genes for 39 plant taxa (Supplementary Table 5), including 37 angiosperms and two outgroup gymnosperms (Ginkgo and Pinus). These 33 genes are present in the chloroplast genome of each of the 39 species so a problem with missing data from the sequence alignment was minimized. The sequence alignment that was used for phylogenetic analyses comprised 26,585 characters. ML analysis resulted in a single
Medicago Trifolium Cicer Lotus 100 Phaseolus Vigna Glycine Fabales

100 100 100 100 100 100

EUROSIDS I

100 100 100 100 100 100

Cucumis I Cucurbitales Hevea Manihot Jatropha P. alba P. tr ichocarpa Malpighiales

ROSIDS

100 100 97 100 100

Gossypium I Malvales Arabidopsis I Brassicales Citrus I Sapindales Eucalyptus Oenothera Myrtales

EUROSIDS II

EUDICOTS

100 S. lycopersicum 100 S. bulbocastanum 100 Atropa 100 100 100 100 Nicotiana Daucus Panax Apiales

Solanales

EUASTERIDS I ASTERIDS EUASTERIDS II

Spinacia I Caryophyllales Ranunculus I Ranunculales 73 100 Sorghum 100 Saccharum 100 100 100 100 100 100 100 Ginkgo I Ginkgoales Pinus I Pinales 0.1 Substitutions/site 100 Typha Acorus I Acorales Calycanthu s I Laurales Nymphaea Nuphar Nymphaeales 99 Zea Triticum Oryza Poales

MONOCOTS

MAGNOLIIDS BASAL ANGIOSPERMS

Amborella I Amborellales

GYMNOSPERMS

Fig. 3. The phylogenetic relationships based on 33 protein-coding genes from 39 plant taxa with the ML value of lnL = 230655.55. Numbers above node are bootstrap support values. Ordinal and higher level group names are also indicated.

110

S. Tangphatsornruang et al. / Gene 475 (2011) 104112 Bausher, M.G., Singh, N.D., Lee, S.B., Jansen, R.K., Daniell, H., 2006. The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var Ridge Pineapple: organization and phylogenetic relationships to other angiosperms. BMC Plant Biol. 6, 21. Birch-Machin, I., Newell, C.A., Hibberd, J.M., Gray, J.C., 2004. Accumulation of rotavirus VP6 protein in chloroplasts of transplastomic tobacco is limited by protein stability. Plant Biotechnol. J. 2, 261270. Bock, R., 2000. Sense from nonsense: how the genetic information of chloroplasts is altered by RNA editing. Biochimie 82, 549557. Bock, R., 2007. Plastid biotechnology: prospects for herbicide and insect resistance, metabolic engineering and molecular farming. Curr. Opin. Biotechnol. 18, 100106. Bock, R., Hermann, M., Kossel, H., 1996. In vivo dissection of cis-acting determinants for plastid RNA editing. EMBO 15, 50525059. Bowman, C.M., Barker, R.F., Dyer, T.A., 1988. In wheat ctDNA, segments of ribosomal protein genes are dispersed repeats, probably conserved by nonreciprocal recombination. Curr. Genet. 14, 127136. Cai, Z., Penaor, C., Kuehl, J.V., Leebens-Mack, J., Carlson, J.E., dePamphilis, C.W., Boore, J.L., Jansen, R.K., 2006. Complete plastid genome sequences of Drimys, Liriodendron, and Piper: implications for the phylogenetic relationships of magnoliids. BMC Evol. Biol. 6, 77. Carpenter, S.D., Charite, J., Eggers, B., Vermaas, W.F., 1990. The psbC start codon in Synechocystis sp. PCC 6803. FEBS Lett. 260, 135137. Chappell, J., 1995a. The biochemistry and molecular biology of isoprenoid metabolism. Plant Physiol. 107, 16. Chappell, J., 1995b. Biochemistry and molecular biology of the isoprenoid biosynthetic pathway in plants. Annu. Rev. Plant Physiol. Plant Mol. Biol. 46, 521547. Chateigner-Boutin, A.L., Small, I., 2007. A rapid high-throughput method for the detection and quantication of RNA editing based on high-resolution melting of amplicons. Nucleic Acids Res. 35, e114. Chaudhuri, S., Maliga, P., 1996. Sequences directing C to U editing of the plastid psbL mRNA are located within a 22 nucleotide segment spanning the editing site. EMBO J. 15, 59585964. Chaudhuri, S., Carrer, H., Maliga, P., 1995. Site-specic factor involved in the editing of the psbL mRNA in tobacco plastids. EMBO J. 14, 29512957. Corneille, S., Lutz, K., Maliga, P., 2000. Conservation of RNA editing between rice and maize plastids: are most editing events dispensable? Mol. Gen. Genet. 264, 419424. Cornish, K., 2001a. Similarities and differences in rubber biochemistry among plant species. Phytochemistry 57, 11231134. Cornish, K., 2001b. Similarities and differences in rubber biochemistry among plant species. Phytochemistry 57, 11231134. Cronn, R., Liston, A., Parks, M., Gernandt, D.S., Shen, R., Mockler, T., 2008. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic Acids Res. 36. Daniell, H., Khan, M.S., Allison, L., 2002. Milestones in chloroplast genetic engineering: an environmentally friendly era in biotechnology. Trends Plant Sci. 7, 8491. Daniell, H., Wurdack, K.J., Kanagaraj, A., Lee, S.B., Saski, C., Jansen, R.K., 2008. The complete nucleotide sequence of the cassava (Manihot esculenta) chloroplast genome and the evolution of atpF in Malpighiales: RNA editing and multiple losses of a group II intron. Theor. Appl. Genet. 116, 723737. Dao, V., Guenther, R., Malkiewicz, A., Nawrot, B., Sochacka, E., Kraszewski, A., Jankowska, J., Everett, K., Agris, P.F., 1994. Ribosome binding of DNA analogs of tRNA requires base modications and supports the extended anticodon. Proc. Natl Acad. Sci. USA 91, 21252129. Darling, A.C., Mau, B., Blattner, F.R., Perna, N.T., 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 13941403. Delannoy, E., Le Ret, M., Faivre-Nitschke, E., Estavillo, G.M., Bergdoll, M., Taylor, N.L., Pogson, B.J., Small, I., Imbault, P., Gualberto, J.M., 2009. Arabidopsis tRNA adenosine deaminase arginine edits the wobble nucleotide of chloroplast tRNAArg(ACG) and is essential for efcient chloroplast translation. Plant Cell 21, 20582071. Dhingra, A., Folta, K.M., 2005. ASAP: amplication, sequencing & annotation of plastomes. BMC Genomics 6. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 17921797. Freyer, R., Lopez, C., Maier, R.M., Martin, M., Sabater, B., Kossel, H., 1995. Editing of the chloroplast ndhB encoded transcript shows divergence between closely related members of the grass family (Poaceae). Plant Mol. Biol. 29, 679684. Gao, L., Yi, X., Yang, Y.X., Su, Y.J., Wang, T., 2009. Complete chloroplast genome sequence of a tree fern Alsophila spinulosa: insights into evolutionary changes in fern chloroplast genomes. BMC Evol. Biol. 9, 130. Goremykin, V.V., Hirsch-Ernst, K.I., Wol, S., Hellwig, F.H., 2003. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that amborella is not a basal angiosperm. Mol. Biol. Evol. 20, 14991505. Goremykin, V.V., Hirsch-Ernst, K.I., Wol, S., Hellwig, F.H., 2004. The chloroplast genome of Nymphaea alba: whole-genome analyses and the problem of identifying the most basal angiosperm. Mol. Biol. Evol. 21, 14451454. Graham, S.W., Olmstead, R.G., 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87, 17121730. Gualberto, J.M., Weil, J.H., Grienenberger, J.M., 1990. Editing of the wheat coxIII transcript: evidence for twelve C to U and one U to C conversions and for sequence similarities around editing sites. Nucleic Acids Res. 18, 37713776. Guo, X., Castillo-Ramirez, S., Gonzalez, V., Bustos, P., Fernandez-Vazquez, J.L., Santamaria, R.I., Arellano, J., Cevallos, M.A., Davila, G., 2007. Rapid evolutionary change of common bean (Phaseolus vulgaris L) plastome, and the genomic diversication of legume chloroplasts. BMC Genomics 8, 228. Halter, C.P., Peeters, N.M., Hanson, M.R., 2004. RNA editing in ribosome-less plastids of iojap maize. Curr. Genet. 45, 331337.

tree with ln L = 230655.55 (Fig. 3). ML bootstrap values were also high, with values of 95% for 36 of the 37 nodes, and 34 nodes with 100% bootstrap support (Fig. 3). MP analysis resulted in a single resolved tree with a length of 40,109, a consistency index of 0.48 and a retention index of 0.657 (not shown). Bootstrap analyses indicated that there were 32 out of 36 nodes with values of 100%. Both the MP and ML trees had similar topologies with two major clades, Monocots and Eudicots with Amborella as the earliest diverging angiosperm lineage. The only incongruence between the MP and ML trees is the position of Calycanthus. In the MP tree, Calycanthus was placed sister to Eudicots; whereas it was positioned close to both Monocots and Eudicots in the ML tree. This incongruence was observed in previous phylogenetic studies (Leebens-Mack et al., 2005; Bausher et al., 2006; Jansen et al., 2006; Ruhlman et al., 2006). Some studies supported Monocots as the sister clade to Magnoliids + Eudicots (Nickrent et al., 2002; Zanis et al., 2002). However, phylogenies based on phytochromes (Mathews and Donoghue, 1999), 17 cp genes (Graham and Olmstead, 2000), 21 cp genes (Tangphatsornruang et al., 2010b) and 61 cp genes (Cai et al., 2006; Lee et al., 2006; Hansen et al., 2007) supported Magnoliids as sister to Monocot and Eudicot. By sequencing three chloroplast genomes of Magnoliids, Cai et al., 2006 provided strong support for Monocots and Eudicots as sister clades with Magnoliids diverging before the MonocotsEudicots split. Our MP and ML trees revealed a monophyly of the Monocots and Eudicots where Ranunculales was placed sister to the remaining Eudicots. The overall structure of the trees is similar to the previously reported trees (Lee et al., 2006; Daniell et al., 2008; Logacheva et al., 2008; Tangphatsornruang et al., 2010b). Addition of the H. brasiliensis chloroplast genes placed Hevea sister to Manihot and grouped together with Jatropha and Populus in the Malpighiales order and provided a strong support for a monophyletic group of the eurosid I. The relationships in the Malpighiales order were also supported by the study based on the atpF gene (Daniell et al., 2008). 4. Conclusion We performed shotgun genome sequencing of H. brasiliensis using the 454 pyrosequencing technology and obtained the complete chloroplast genome sequence. The approach has been demonstrated here as a fast and efcient way for obtaining organellar genomes. Gene content and structural organization of the rubber tree chloroplast genome are similar to that of M. esculenta, with an exception of the 30-kb fragment rearrangement in the LSC. By comparing the rubber tree chloroplast genes and the cDNA sequences, we determined the distribution and the location of RNA editing sites in the chloroplast genome. The proposed phylogenetic relationships among angiosperms, based on chloroplast DNA sequences including those of the rubber tree chloroplast DNA reported here, provided a strong support for a monophyletic group of the eurosid I and demonstrated a close relationship between Hevea, Manihot, Jatropha and Populus in Malpighiales. Supplementary materials related to this article can be found online at doi:10.1016/j.gene.2011.01.002. Acknowledgements We acknowledge funding support by the National Center for Genetic Engineering and Biotechnology, Thailand. References
Agris, P.F., Vendeix, F.A., Graham, W.D., 2007. tRNA's wobble decoding of the genome: 40 years of modication. J. Mol. Biol. 366, 113. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403410.

S. Tangphatsornruang et al. / Gene 475 (2011) 104112 Hammani, K., Okuda, K., Tanz, S.K., Chateigner-Boutin, A.L., Shikanai, T., Small, I., 2009. A study of new Arabidopsis chloroplast RNA editing mutants reveals general features of editing factors and their target sites. Plant Cell 21, 36863699. Hansen, D.R., Dastidar, S.G., Cai, Z., Penaor, C., Kuehl, J.V., Boore, J.L., Jansen, R.K., 2007. Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae). Mol. Phylogenet. Evol. 45, 547563. Hayes, M.L., Hanson, M.R., 2007. Identication of a sequence motif critical for editing of a tobacco chloroplast transcript. RNA 13, 281288. Heinze, B., 2007. A database of PCR primers for the chloroplast genomes of higher plants. Plant Meth. 3, 4. Hipkins, V.D., Marshall, K.A., Neale, D.B., Rottmann, W.H., Strauss, S.H., 1995. A mutation hotspot in the chloroplast genome of a conifer (Douglas-r: Pseudotsuga) is caused by variability in the number of direct repeats derived from a partially duplicated tRNA gene. Curr. Genet. 27, 572579. Hiratsuka, J., Shimada, H., Whittier, R., Ishibashi, T., Sakamoto, M., Mori, M., Kondo, C., Honji, Y., Sun, C.R., Meng, B.Y., et al., 1989. The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol. Gen. Genet. 217, 185194. Hirose, T., Sugiura, M., 2001. Involvement of a site-specic trans-acting factor and a common RNA-binding protein in the editing of chloroplast mRNAs: development of a chloroplast in vitro RNA editing system. EMBO J. 20, 11441152. Hirose, T., Kusumegi, T., Tsudzuki, T., Sugiura, M., 1999. RNA editing sites in tobacco chloroplast transcripts: editing as a possible regulator of chloroplast RNA polymerase activity. Mol. Gen. Genet. 262, 462467. Hoch, B., Maier, R.M., Appel, K., Igloi, G.L., Kossel, H., 1991. Editing of a chloroplast mRNA by creation of an initiation codon. Nature 353, 178180. Huse, S.M., Huber, J.A., Morrison, H.G., Sogin, M.L., Welch, D.M., 2007. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 8, R143. Jansen, R.K., Raubeson, L.A., Boore, J.L., dePamphilis, C.W., Chumley, T.W., Haberle, R.C., Wyman, S.K., Alverson, A.J., Peery, R., Herman, S.J., Fourcade, H.M., Kuehl, J.V., McNeal, J.R., Leebens-Mack, J., Cui, L., 2005. Methods for obtaining and analyzing whole chloroplast genome sequences. Meth. Enzymol. 395, 348384. Jansen, R.K., Kaittanis, C., Saski, C., Lee, S.B., Tomkins, J., Alverson, A.J., Daniell, H., 2006. Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol. Biol. 6, 32. Kahlau, S., Aspinall, S., Gray, J.C., Bock, R., 2006. Sequence of the tomato chloroplast DNA and evolutionary comparison of solanaceous plastid genomes. J. Mol. Evol. 63, 194207. Karcher, D., Bock, R., 2009. Identication of the chloroplast adenosine-to-inosine tRNA editing enzyme. RNA 15, 12511257. Kato, T., Kaneko, T., Sato, S., Nakamura, Y., Tabata, S., 2000. Complete structure of the chloroplast genome of a legume, Lotus japonicus. DNA Res. 7, 323330. Knoop, V., 2010. When you can't trust the DNA: RNA editing changes transcript sequences. Cell. Mol. Life Sci 68, 567586. Ko, J.H., Chow, K.S., Han, K.H., 2003. Transcriptome analysis reveals novel features of the molecular events occurring in the laticifers of Hevea brasiliensis (para rubber tree). Plant Mol. Biol. 53, 479492. Kotera, E., Tasaka, M., Shikanai, T., 2005. A pentatricopeptide repeat protein is essential for RNA editing in chloroplasts. Nature 433, 326330. Kuroda, H., Suzuki, H., Kusumegi, T., Hirose, T., Yukawa, Y., Sugiura, M., 2007. Translation of psbC mRNAs starts from the downstream GUG, not the upstream AUG, and requires the extended ShineDalgarno sequence in tobacco chloroplasts. Plant Cell Physiol. 48, 13741378. Kurtz, S., Schleiermacher, C., 1999. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15, 426427. Lee, S.B., Kaittanis, C., Jansen, R.K., Hostetler, J.B., Tallon, L.J., Town, C.D., Daniell, H., 2006. The complete chloroplast genome sequence of Gossypium hirsutum: organization and phylogenetic relationships to other angiosperms. BMC Genomics 7, 61. Leebens-Mack, J., Raubeson, L.A., Cui, L., Kuehl, J.V., Fourcade, M.H., Chumley, T.W., Boore, J.L., Jansen, R.K., depamphilis, C.W., 2005. Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one's way out of the Felsenstein zone. Mol. Biol. Evol. 22, 19481963. Lenz, H., Rudinger, M., Volkmar, U., Fischer, S., Herres, S., Grewe, F., Knoop, V., 2009. Introducing the plant RNA editing prediction and analysis computer tool PREPACT and an update on RNA editing site nomenclature. Curr. Genet. 56, 189201. Lichtenthaler, H., 1999. The 1-deoxy-D-xylulose-5-phosphate pathway of isoprenoid biosynthesis in plants. Annu. Rev. Plant Physiol. Plant Mol. Biol. 50, 4765. Lichtenthaler, H.K., Schwender, J., Disch, A., Rohmer, M., 1997. Biosynthesis of isoprenoids in higher plant chloroplasts proceeds via a mevalonate-independent pathway. FEBS Lett. 400, 271274. Lidholm, J., Szmidt, A., Gustafsson, P., 1991. Duplication of the psbA gene in the chloroplast genome of two Pinus species. Mol. Gen. Genet. 226, 345352. Logacheva, M.D., Samigullin, T.H., Dhingra, A., Penin, A.A., 2008. Comparative chloroplast genomics and phylogenetics of Fagopyrum esculentum ssp. ancestralea wild ancestor of cultivated buckwheat. BMC Plant Biol. 8, 59. Lurin, C., Andres, C., Aubourg, S., Bellaoui, M., Bitton, F., Bruyere, C., Caboche, M., Debast, C., Gualberto, J., Hoffmann, B., Lecharny, A., Le Ret, M., Martin-Magniette, M.L., Mireau, H., Peeters, N., Renou, J.P., Szurek, B., Taconnat, L., Small, I., 2004. Genomewide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell 16, 20892103. Maier, R.M., Neckermann, K., Igloi, G.L., Kossel, H., 1995. Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and ne tuning of genetic information by transcript editing. J. Mol. Biol. 251, 614628.

111

Maliga, P., 2002. Engineering the plastid genome of higher plants. Curr. Opin. Plant Biol. 5, 164172. Maliga, P., 2004. Plastid transformation in higher plants. Annu. Rev. Plant Biol. 55, 289313. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., Rothberg, J.M., 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376380. Mathews, S., Donoghue, M.J., 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286, 947950. McCauley, D.E., 1992. The use of chloroplast DNA polymorphism in studies of gene ow in plants. Trends Ecol. Evol. 10, 198202. McGarvey, D.J., Croteau, R., 1995. Terpenoid metabolism. Plant Cell 7, 10151026. Millen, R.S., Olmstead, R.G., Adams, K.L., Palmer, J.D., Lao, N.T., Heggie, L., Kavanagh, T.A., Hibberd, J.M., Gray, J.C., Morden, C.W., Calie, P.J., Jermiin, L.S., Wolfe, K.H., 2001. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell 13, 645658. Miyamoto, T., Obokata, J., Sugiura, M., 2002. Recognition of RNA editing sites is directed by unique proteins in chloroplasts: biochemical identication of cis-acting elements and trans-acting factors involved in RNA editing in tobacco and pea chloroplasts. Mol. Cell. Biol. 22, 67266734. Moon, E., Kao, T.H., Wu, R., 1987. Rice chloroplast DNA molecules are heterogeneous as revealed by DNA sequences of a cluster of genes. Nucleic Acids Res. 15, 611630. Moore, M.J., Dhingra, A., Soltis, P.S., Shaw, R., Farmerie, W.G., Folta, K.M., Soltis, D.E., 2006. Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol. 6, 17. Neale, D.B., Saghai-Maroof, M.A., Allard, R.W., Zhang, Q., Jorgensen, R., 1988. Chloroplast DNA diversity in populations of wild and cultivated barley. Genetics 120, 11051110. Nickrent, D.L., Blarer, A., Qiu, Y.-L., Soltis, D.E., Soltis, P.S., Zanis, M., 2002. Molecular data place Hydnoraceae with Aristolochiaceae. Am. J. Bot. 89, 18091817. Ohyama, K., Fukuzawa, H., Kohchi, T., Shirai, H., Sano, T., Sano, S., Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi, H., Ozeki, H., 1986. Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322, 572574. Page, R.D., 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12, 357358. Palmer, J.D., Osorio, B., Aldrich, J., Thompson, W.F., 1987. Chloroplast DNA evolution among legumes: loss of a large inverted repeat occurred prior to other sequence rearrangements. Curr. Genet. 11, 275286. Petit, R.J., Aguinagalde, I., de Beaulieu, J.L., Bittkau, C., Brewer, S., Cheddadi, R., Ennos, R., Fineschi, S., Grivet, D., Lascoux, M., Mohanty, A., Muller-Starck, G.M., DemesureMusch, B., Palme, A., Martin, J.P., Rendell, S., Vendramin, G.G., 2003. Glacial refugia: hotspots but not melting pots of genetic diversity. Science 300, 15631565. Petit, R.J., Duminil, J., Fineschi, S., Hampe, A., Salvini, D., Vendramin, G.G., 2005. Comparative organization of chloroplast, mitochondrial and nuclear diversity in plant populations. Mol. Ecol. 14, 689701. Ptzinger, H., Weil, J.H., Pillay, D.T., Guillemaut, P., 1990. Codon recognition mechanisms in plant chloroplasts. Plant Mol. Biol. 14, 805814. Provan, J., Powell, W., Hollingsworth, P.M., 2001. Chloroplast microsatellites: new tools for studies in plant ecology and evolution. Trends Ecol. Evol. 16, 142147. Raubeson, L.A., Peery, R., Chumley, T.W., Dziubek, C., Fourcade, H.M., Boore, J.L., Jansen, R.K., 2007. Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus. BMC Genomics 8, 174. Rochaix, J.D., Kuchka, M., Mayeld, S., Schirmer-Rahire, M., Girard-Bascou, J., Bennoun, P., 1989. Nuclear and chloroplast mutations affect the synthesis or stability of the chloroplast psbC gene product in Chlamydomonas reinhardtii. EMBO J. 8, 10131021. Ruhlman, T., Lee, S.B., Jansen, R.K., Hostetler, J.B., Tallon, L.J., Town, C.D., Daniell, H., 2006. Complete plastid genome sequence of Daucus carota: implications for biotechnology and phylogeny of angiosperms. BMC Genomics 7. Sasaki, T., Yukawa, Y., Miyamoto, T., Obokata, J., Sugiura, M., 2003. Identication of RNA editing sites in chloroplast transcripts from the maternal and paternal progenitors of tobacco (Nicotiana tabacum): comparative analysis shows the involvement of distinct trans-factors for ndhB editing. Mol. Biol. Evol. 20, 10281035. Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Tabata, S., 1999. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6, 283290. Schmitz-Linneweber, C., Maier, R.M., Alcaraz, J.P., Cottet, A., Herrmann, R.G., Mache, R., 2001. The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Mol. Biol. 45, 307315. Schmitz-Linneweber, C., Regel, R., Du, T.G., Hupfer, H., Herrmann, R.G., Maier, R.M., 2002. The plastid chromosome of Atropa belladonna and its comparison with that of Nicotiana tabacum: the role of RNA editing in generating divergence in the process of plant speciation. Mol. Biol. Evol. 19, 16021612. Schuster, W., Hiesel, R., Wissinger, B., Brennicke, A., 1990. RNA editing in the cytochrome b locus of the higher plant Oenothera berteriana includes a U-to-C transition. Mol. Cell. Biol. 10, 24282431. Shimada, H., Sugiura, M., 1991. Fine structural features of the chloroplast genome: comparison of the sequenced chloroplast genomes. Nucleic Acids Res. 19, 983995. Stamatakis, A., 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 26882690. Steane, D.A., 2005. Complete nucleotide sequence of the chloroplast genome from the Tasmanian blue gum, Eucalyptus globulus (Myrtaceae). DNA Res. 12, 215220.

112

S. Tangphatsornruang et al. / Gene 475 (2011) 104112 Wakasugi, T., Tsudzuki, J., Ito, S., Nakashima, K., Tsudzuki, T., Sugiura, M., 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl Acad. Sci. USA 91, 97949798. Wakasugi, T., Hirose, T., Horihata, M., Tsudzuki, T., Kossel, H., Sugiura, M., 1996. Creation of a novel protein-coding region at the RNA level in black pine chloroplasts: the pattern of RNA editing in the gymnosperm chloroplast is different from that in angiosperms. Proc. Natl Acad. Sci. USA 93, 87668770. Wojciechowski, M.F., Lavin, M., Sanderson, M.J., 2004. A phylogeny of legume (Leguminosae) based on analysis of the plastid matK gene resolves many wellsupported subclades within the family. Am. J. Bot. 91, 18461862. Wolfe, K.H., Morden, C.W., Palmer, J.D., 1991. Ins and outs of plastid genome evolution. Curr. Opin. Genet. Dev. 1, 523529. Wyman, S.K., Jansen, R.K., Boore, J.L., 2004. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20, 32523255. Yoshinaga, K., Iinuma, H., Masuzawa, T., Uedal, K., 1996. Extensive RNA editing of U to C in addition to C to U substitution in the rbcL transcripts of hornwort chloroplasts and the origin of RNA editing in green plants. Nucleic Acids Res. 24, 10081014. Yukawa, M., Tsudzuki, T., Sugiura, M., 2005. The 2005 version of the chloroplast DNA sequence from tobacco (Nicotiana tabacum). Plant Mol. Biol. Rep. 23, 17. Zanis, M.J., Soltis, D.E., Soltis, P.S., Mathews, S., Donoghue, M.J., 2002. The root of the angiosperms revisited. Proc. Natl Acad. Sci. USA 99, 68486853.

Steinhauser, S., Beckert, S., Capesius, I., Malek, O., Knoop, V., 1999. Plant mitochondrial RNA editing. J. Mol. Evol. 48, 303312. Stern, D.B., Lonsdale, D.M., 1982. Mitochondrial and chloroplast genomes of maize have a 12-kilobase DNA sequence in common. Nature 299, 698702. Stern, D.B., Palmer, J.D., 1984. Extensive and widespread homologies between mitochondrial DNA and chloroplast DNA in plants. Proc. Natl Acad. Sci. USA 81, 19461950. Swofford, D.L., 2002. PAUP: Phylogenetic Analysis Using Parsimony version 4.0b. Sinauer Associates, Sunderland, Massachusetts. Tangphatsornruang, S., Birch-Machin, I., Newell, C.A., Gray, J., 2010a. The effect of different 3 untranslated regions on the accumulation and stability of transcripts of a gfp transgene in chloroplasts of transplastomic tobacco. Plant Mol. Biol. doi:10.1007/s11103-010-9689-1 (Epub). Tangphatsornruang, S., Sangsrakru, D., Chanprasert, J., Uthaipaisanwong, P., Yoocha, T., Jomchai, N., Tragoonrung, S., 2010b. The chloroplast genome sequence of mungbean (Vigna radiata) determined by high-throughput pyrosequencing: structural organization and phylogenetic relationships. DNA Res. 17, 1122. Tillich, M., Funk, H.T., Schmitz-Linneweber, C., Poltnigg, P., Sabater, B., Martin, M., Maier, R.M., 2005. Editing of plastid RNA in Arabidopsis thaliana ecotypes. Plant J. 43, 708715. Vangerow, S., Teerkorn, T., Knoop, V., 1999. Phylogenetic information in the mitochondrial nad5 gene of pteridophytes: RNA editing and intron sequences. Plant Biol. 1, 235243.

Você também pode gostar