3.5 N best parameter selection

next up previous   Next: 3.6 Input sequence Up: 3. Methods 1 Previous: 3.4 Energy dot

3.5 N best parameter selection

In this case, the program automatically computes a selection of optimal and suboptimal foldings that satisfy certain user defined conditions. The user is prompted for three parameters. The first is the ``percentage for sort''. This is an integer, p. All computed foldings will have energies within p% from the computed minimum free energy. A value of 10% should guarantee that all reasonable secondary structure motifs are found. This value should be increased for short sequences so that the actual energy increment is at least 2 to 3 kcal/mole. Similarly, p should be decreased for very long sequences so that the energy increment is not greater than 15 to 20 kcal/mole, with a recommended range of 10 to 12 kcal/mole. The default value of p is 0, indicating that only optimal foldings will be computed.

The next prompt is for the ``number of tracebacks''. This is an upper bound for the number of foldings that will be computed. Although the default is 1, I strongly recommend that this parameter be set high, to several hundred. Both the ``percentage for sort'' and ``window size'' parameters limit the number of folding that are computed. It is better to let these parameters do their work than to artificially truncate the list of foldings at some arbitrary number.

The third prompt is for the ``window size''. It has the same meaning as it does in the energy dot plot mode, but in N best mode, this parameter ensures that every pair of foldings in the output will be sufficiently different from one another. The default is 0, so that even trivially different foldings might be found. There are no rules, but in Table 1, I have listed some recommended window sizes for different sizes of sequences. Selecting a smaller window size will increase the number of foldings that are found, but some of these foldings may be similar. Selecting a larger window size will cut down on the number of computed foldings, but some reasonable folding motifs may be lost.

Table 1:Suggested ``window size'' values depending on sequence length. The user is encouraged to experiment with this parameter.
Sequence size Suggested window size
0-50 2
50-120 3
120-300 5
300-500 7-8
500-800 10-12
800-1200 15
1200-2000 20
> 2000 25 or more

As in 3.4, program execution continues with input save file name selection (3.7) in a continuation run, and with sequence file name selection (3.6) in a regular run. In ``Multiple molecules'' mode, energy file input (3.8) is next.

next up previous   Next: 3.6 Input sequence Up: 3. Methods 1 Previous: 3.4 Energy dot

Michael Zuker
Thu Nov 2 14:28:14 CST 1995