Loops and Nearest neighbor rules




Next: Optimal and suboptimal foldings Up: OUTPUT Previous: OUTPUT
The energy dot plot
A nucleic acid secondary structure dot plot is a triangular plot that depicts base
pairs as dots or other symbols. We shall refer to these symbols as dots. A dot in
column i and row j of a triangular array,
represents the base pair
i.j. The advantage of a dot plot is that it can display the base pairs in
more than 1 folding simultaneously. It can be used to compare a few foldings, or the
base pair distribution in many millions of foldings.
Mfold computes a number ΔG(i,j)
for every possible base pair, i.j. This is the
minimum free energy of any folding that contains the i.j base pair. As
above, we let be the overall minimum folding free energy, and
a user selected free energy increment. Clearly

The energy increment is derived from




This dot plot contains the superposition of all possible foldings whose folding energy is within



The energy dot plot gives an overall visual impression of how
``well-defined'' the folding is. A cluttered plot, or cluttered regions, indicate
either structural plasticity (the lack of well-defined structure) or else the inability
of the algorithm to predict a structure with confidence. A couple of crude measures of
``well-definedness'' have been introduced in mfold. The first is ``P-num''.
is a measure of the level of promiscuity of
ri in its pairing with other bases in foldings within
of
. It is the number of different base pairs,
i.j, or k.i that can form in this set of foldings, and is
simply the number of dots in the ith row and
ith column of the energy dot plot . If
is defined to be 1 when ``expression'' is true, and 0
otherwise, then P-num may be defined as:

P-num pertains to individual bases. H-num is ``well-definedness'' measure for a base pair i.j. It is the average value of the two P-num quantities, adjusted by removing the ``desirable'' i.j base pair. That is:

A helix, already defined as a collection of two or more consecutive base pairs, may be described as a triple i,j,k, where k is the number of base pairs, and the actual base pairs are


There are 5 files associated with the energy dot plot .
`FILE_NAME.PLOT' : This is a text file that contains all the base
pairs on the energy dot plot , organized into helices for which
is constant. The first record is a header, and each subsequent
record describes a single helix. The records are usually sorted by
, and are often filtered so that short helices or isolated base
pairs (helices of length 1) in suboptimal foldings are removed. Figure 9 shows a sample plot file.
level length istart jstart energy 1 8 206 242 -972 1 7 319 434 -972 1 7 108 141 -972 1 7 53 185 -972 1 6 334 412 -972 1 6 308 444 -972 1 6 288 472 -972 1 6 247 279 -972 ... 2 4 8 23 -971 2 2 69 78 -971 2 4 1 24 -970 2 2 10 17 -970 2 3 345 400 -967 2 2 297 462 -967 ... |
`FILE_NAME.ANN' : This file contains P-num information for a
particular
. The ith record contains i and
. This file is used for annotating plotted structures.
`FILE_NAME.H-NUM' : This file is the same as `file_name.plot', except
that the ``energy'' column is replaced by an ``h-num'' column. These files are usually
sorted by h-num; lowest to highest, or best determined to worst determined. Often, only
helices in optimal foldings are retained. Figure 10
shows part of a sorted and filtered h-num file corresponding to the plot file in Figure
9.
level length istart jstart h-num 1 4 38 194 6.8 1 4 215 232 7.3 1 5 31 201 8.4 1 7 53 185 8.4 1 2 47 189 11.0 1 8 206 242 11.9 1 6 61 176 13.7 1 4 89 163 13.8 1 3 255 271 14.0 1 3 104 145 15.0 1 1 68 79 16.0 1 4 121 131 17.0 1 6 288 472 17.3 ... 1 2 353 389 35.0 1 3 364 377 38.7 1 3 297 459 39.0 |
`FILE_NAME.PS' : This is a PostScript file of the energy dot plot .
`FILE_NAME.GIF' : This is an image of the energy dot plot in ``gif'' format, suitable for display on web pages.




Next: Optimal and suboptimal foldings Up: OUTPUT Previous: OUTPUT
![]() |
Michael Zuker Center for Computational Biology Washington University in St. Louis 1998-12-05 |