3.9 Output file and format selection
Next: 3.10 Main
menu Up: 3. Methods
1 Previous: 3.8
Energy file
3.9 Output file and format selection
Three different types of folding output formats are available:
- printer - a rough and inelegant output, but one that can be read and interpreted directly,
- ct file - an output file that can be used as the input to other programs that draw secondary structures or that analyze a large number of suboptimal foldings,
- reg (region) file - an output file that is more condensed than the ct file, but lacks the sequence information. It, too, can be used as the input to other programs.
10 20 30 ----GGC | - G T GTAA AGC CGGGCGCG GC GC CACGCCT TCCC A GTCCGCGC CG TG GTGCGGG AGGG C AAAAAAA ^ G G - ---- CTT . 160 150 140 40 50 60 -------------------C A GA - GGC GAGGCGGG G TTGCT TG CCG CTCTGCCC C AGCGA AC ATTAAAAACATAAAAAACAT - AG T 130 120 110 100 70 80 .....-A AGTTCG GCCCAGG CGGGTCC A .....AG GACCAG 90 170 TAA AGC TCCC T AGGG A --- CTC 180 190 200 210 220 230 GGC A GA --GA AGG GC A - T TGAGGCGGG G TCGCTT GCCCGGG CGGAG TGCAGTG GC CG G ACTCTGCCC C AGCGAG CGGGTCC GCCTC ACGTCAC CG GC A --- - AG GCAG --G -- - C T 280 270 260 250 240
Figure 1: Printer output of the optimal folding of an Alu consensus sequence [29]. Alu sequences can occur within introns, and hence within RNA transcripts. The record length here is 80 columns. Note that the piece from nucleotides 68 to 97 has 2 sets of 5 dots preceding it. This piece is ``continued'' from the stem above it. The truncation occurs because the entire stem cannot fit into 80 columns. The symbols ``|'' and ``^'' point to the base pair C4-G160 that was selected. Base pairs are selected by the user from the energy dot plot in ``Sub-optimal plot'' mode, or automatically by the program in ``N-best'' mode, and an optimal structure containing that base pair is computed.
-------------- a -------------- ------------------- b ------------------- 290 ENERGY = -114.5 ACJL ( 1) 4 160 8 -16.7 1 G 0 2 0 1 ( 2) 12 151 2 -3.4 2 G 1 3 0 2 ( 3) 15 148 2 -1.9 3 C 2 4 0 3 ( 4) 18 146 7 -13.7 4 C 3 5 160 4 ( 5) 29 44 4 -8.1 5 G 4 6 159 5 ( 6) 45 139 3 -6.3 6 G 5 7 158 6 ( 7) 49 116 8 -15.0 7 G 6 8 157 7 ( 8) 58 108 1 0.0 8 C 7 9 156 8 ( 9) 61 105 5 -7.1 9 G 8 10 155 9 ( 10) 66 99 2 -1.8 ... (272 intervening lines) ... ( 11) 69 95 7 -15.6 282 C 281 283 184 282 ( 12) 164 179 4 -8.1 283 A 282 284 183 283 ( 13) 183 283 9 -16.8 284 A 283 285 0 284 ( 14) 193 274 1 0.0 285 A 284 286 0 285 ( 15) 196 271 6 -9.9 286 A 285 287 0 286 ( 16) 204 261 7 -15.1 287 A 286 288 0 287 ( 17) 214 253 5 -8.9 288 A 287 289 0 288 ( 18) 221 248 7 -12.6 289 A 288 290 0 289 ( 19) 229 241 2 -3.4 290 A 289 0 0 290 ( 20) 231 238 2 -2.0
Figure 2:Examples of ct and region files. a) Partial ct file of the optimal Alu folding presented in Figure 1. b) The entire region file for the same Alu folding. Note the 0 energies assigned to the single base ``helices'' (numbers 8 and 14). This is because energy assignment in helices is for the stacking of one base pair over another.
An example of printer output is given in Figure 1. Figure 2 contains examples of ct and region output. The first line of a ct file contains three items; the number of bases in the folded fragment, the folding energy, and the sequence label. Subsequent lines contain 5 fields:
- The base number, i, within the folded segment. This is called the internal base number.
- The base identity.
- The internal base number of the 5′ neighbor of base i (0 for the first base).
- The internal base number of the 3′ neighbor of base i (0 for the last base).
- The internal number of the base to which i is paired. This is 0 if base i is single-stranded.
- The historical numbering of base i within the folded sequence.
- The helix number.
- The historical numbering, i, of the 5′ external base of the helix.
- The historical numbering, j, of the 3′ external base of the helix.
- The number of base pairs, k, in the helix. Thus in the sequence r1 ,r2 , ... ,rn, the helix contains the k base pairs ri-rj , ri+1-rj-1 , ... , ri+k-1-rj-k+1.
- The energy of the helix in kcal/mol. This does not include single base stacking on the ends that might occur.
The user is prompted on whether or not printer output is desired (default is yes). If ``y'' is chosen, a further prompt asks if the output should go to the terminal (standard output, default is no). If ``y'' is chosen, the user is prompted for a file name. A default based on the sequence label is available. The next prompt is for the number of columns on the printing device or terminal. This is the ``record length'' of the output file. The default is 80, but other values can be selected.
In a similar way, the user is prompted for ct file and region file output, and for file names as required. In a regular run, program execution will continue with the main menu (3.10). In a continuation run in ``N best'' mode, there will be a pause, possibly a long one, while the program computes structures and writes them into the output files. The program will print out a message for every structure found, and then terminate. In a continuation run in ``Sub-optimal plot'' mode, program execution continues with the energy dot plot (3.11).
Next: 3.10 Main
menu Up: 3. Methods
1 Previous: 3.8
Energy file
Michael Zuker
Thu Nov 2 14:28:14 CST 1995