Assessing the reliability of a prediction

next up previous
Next: Merits of annotation Up: Discussion Previous: Discussion

Assessing the reliability of a prediction

Published analyses with ribosomal RNAs [Zuker & Jacobson, 1995] and with coliphage Qβ [Jacobson & Zuker, 1993,Jacobson et al., 1998] have shown that well determined structural features are more likely to be predicted correctly by mfold. Another study has examined the reliability of structure prediction using the Vienna program that calculates base pair probabilities for a large number of 16S and 23S rRNAs [Huynen et al., 1997]. These authors have similarly shown that ``well determined'' predicted structures are more likely to be predicted correctly. That is, ``well determined'' structures contain a greater percentage of correct base pairs than do ``poorly determined'' structures. Their notion of ``well-determined'' is equivalent to a low entropy measure, which is derived from the boxplot base pair probabilities and will be defined below. These studies by other groups have analyzed entire structures. They have not, however, examined portions of secondary structures to see whether some features are more likely to be correct than others. It also remains unknown whether or not base pairs with high probabilities are more likely to be correct; that is, in comparative models.

The fact that some predicted structures or parts of these structures are ``well-determined'', and that others are ``poorly determined'' may mean nothing more than that the former type of predictions are more reliable. We believe that this phenomenon means more. The analyses with ribosomal RNAs [Huynen et al., 1997,Zuker & Jacobson, 1995] and our own structural analyses of wild type and mutant coliphage Qβ RNAs [Jacobson et al., 1998] suggest that the relative frequency of predicted alternative conformations of the RNA can reflect physical properties of the RNA. Well determined 16S rRNA predictions are found primarily among the Archaea, organisms that grow in harsh environments and at high temperature. The structure of rRNAs in organisms that grow in these environment are likely to be optimized both to fold efficiently and to be unusually stable. In coliphage RNAs, experimental studies show that well-determined structural domain corresponds to domains within the RNA that are unusually stable.

In addition to well determined structures, the prediction of poorly determined structures may provide insight into regions of potential structural plasticity within an RNA molecule. In coliphage RNAs, two cases have been found where competing alternative conformers are found in regions of the RNA that are predicted to be poorly determined by computer modeling [Jacobson & Zuker, 1993,Jacobson et al., 1998]. The analysis of these RNAs suggest that stable structural domains lie interspersed among regions where greater structural plasticity is observed. However, the observed correspondence between poorly determined structural domains and real structural plasticity for RNA coliphage RNA may not be true of other RNAs. For example, the entire predicted structure of many ribosomal RNAs is poorly determined. While there is growing recognition that both protein chaperones and small RNAs may contribute to the proper folding of these RNAs within the living cell [Konings & Gutell, 1995], the analysis of the conformation of 16S rRNA from Escherichia coli with chemical and enzymatic probes has shown that the structure of this RNA is unique and consistent with the phylogenetic model for this RNA [Murzina et al., 1988,Noller, 1984,Gutell, 1994]. Since the predicted structure for this entire RNA is poorly determined [Zuker & Jacobson, 1995], it is clear that the physical interpretation of poorly determined structural features that are obtained by computer modeling may be complex and may always require experimental verification. In the case of the E. coli rRNA, the poorly determined prediction might indicate that the molecules are easily misfolded in solution.

next up previous  Next: Merits of annotation Up: Discussion Previous: Discussion

Michael Zuker
Institute for Biomedical Computing
Washington University in St. Louis
August 21 1998.