Você está na página 1de 4

Supplemental Methods Identification of MULE-related consensus sequences from the RECON output The 266 MULE-related consensus sequences

were identified based on the 3300 repeat families recovered by RECON as follows: (1) the sequences of repeat families were used as queries to search against the sequences of previously characterized MULEs in rice1-3. If a sequence is similar to known MULEs (BLASTN E < 10-10), it is considered to be a MULE related sequence; (2) the sequences of repeat families were used to search against proteins in GenBank (downloaded on Feb. 15, 2003). If a sequence is similar to known Mutator-like proteins (BLASTX E< 10-10), it is considered to be a MULE related sequence; (3) if a sequence was not similar to any known TEs, the following procedure was used to define new MULE TIRs since many consensus sequences in the RECON output represent a single MULE TIR. First, the relevant sequence was used to search the rice genome database and at least 20 hits (if there are 20 or more hits, BLASTX E< 10-10) and the100 bp flanking sequence on each side of the hits were recovered. The recovered sequences were then aligned using pileup in GCG (see Methods), with the resulting output examined for the presence of possible border between putative elements and their flanking sequences. A border was defined if the sequence homology stops at the same position for more than half of the aligned sequences, and the 10 bp sequence at the termini of the putative element was compared with known MULEs. If the 10 most terminal nucleotides were similar (at least 6 out of 10 bp are identical) to any known MULE termini (see Supplemental Table 1 for possible combinations of the most terminal sequence), the consensus sequence was considered to be a TIR candidate. To test whether the candidate represents the TIR of a MULE, the 10 kb flanking sequences of the relevant

hits were searched for the presence of the same candidate sequence in an inverted orientation. If such a pair of sequences was found and a 9 bp TSD was identified immediately beside the termini, it was considered to be a MULE. If for a given consensus five such elements were found, the consensus was considered to be a MULE TIR.

PCR amplification of Pack-MULE fragments from Nipponbare DNA To further confirm the presence of Pack-MULEs in the rice genome, PCR experiments were performed for the 3 Pack-MULEs described in Figure 3, and for 10 randomly selected Pack-MULEs from the 100 Pack-MULEs (from chromosomes 1 and 10) analyzed in this study. Primer location is diagrammed in Supplemental Figure 2A (also see Figure 3) with a generic Pack-MULE: each of the 13 Pack-MULEs tested was amplified with two pairs of primers (purple and blue), with the two internal primers located in the acquired region. In this way the two amplicons should cover the whole element plus some flanking sequence. For 4 of the Pack-MULEs tested, significant secondary structure necessitated either digestion of genomic DNA with a restriction enzyme prior to PCR amplification or the use of 76oC as extension temperature (compared to other reactions which were at 72 oC, see Supplemental Table 7 for details). For all elements tested, fragments of the anticipated size were obtained (Supplemental Table 7 and Supplemental Figure 2B). A touchdown protocol was used for the PCR amplification, i.e., the annealing temperature starts at 6 degree higher than the final annealing temperature and then reduced to that temperature in 1oC increments each cycle. The temperature cycling parameters are: 94 oC 3min; 94 oC 45 sec, (A+6)-(A+1)oC 45 sec, 72 oC 60 sec, touch-

down ; 94 oC 45 sec, A oC 45 sec, 72 oC 60 sec for 32 cylces ; a final cycle of 72 oC for 3 min; where A stands for the final annealing temperature for individual reaction (Supplemental Table 7).

Control experiment for Ka/Ks analysis The Ka/Ks analysis indicated that 18 out of the 54 sequence pairs (MULE vs genomic homolog) have potentially been under purifying selection (p < 0.05). Such a value (19%, since the 54 sequence pairs derived from 100 Pack-MULEs) is much higher than that is expected by chance (5%), and can be explained in two ways. It may indicate that many of the Pack-MULEs have been functional. Alternatively, the high value could be an artifact due to the misidentification of the putative genomic homolog. This could occur, for example, when the genomic homolog and its paralog were duplicated and diverged under purifying selection. Thereafter the genomic homolog was captured by the Pack-MULE, followed by its deletion from the genome (or the genomic homolog may be missing from the available database, e.g., located in sequencing gaps). In this case, the paralog was identified as the genomic homolog because it was the closest related copy of the PackMULE sequence in the database. As a result, the low Ks/Ka value would reflect the purifying selection between genomic paralogs instead of that between Pack-MULE and its genomic homolog. If a paralog was mistakenly identified as its genomic homolog, it will be signified by a relatively recent element associated with a distantly related genomic homolog. To test the notion, the age of a Pack-MULE was estimated by comparing the sequence similarity of its TIRs since transpositionally competent MULEs have highly similar TIRs4-6. If some of the apparent genomic homologs are indeed

paralogs, it will more likely occur in elements where the sequence similarity between the Pack-MULE and its genomic homolog is lower than or close to that of their TIRs. Accordingly, if the observed purifying selection is an artifact and most of the PackMULE captured sequences sustained neutral drift, we would expect more examples of purifying selection in this group of elements (refer to the paralog group, where the value of Internal minus TIR is smaller or equal to 0 in Supplemental Table 10) than that for other elements. The result indicates that among 7 sequence pairs in the paralog group, two of them (29%) show significant purifying selection (Supplemental Table 10). Because this value is not higher than the average (18/54 = 33%), it suggests that the paralog-induced artifact is not a significant issue.

References

Você também pode gostar