3.9 Output file and format selection

next up previous   Next: 3.10 Main menu Up: 3. Methods 1 Previous: 3.8 Energy file

3.9 Output file and format selection

Three different types of folding output formats are available:

  1. printer - a rough and inelegant output, but one that can be read and interpreted directly,
  2. ct file - an output file that can be used as the input to other programs that draw secondary structures or that analyze a large number of suboptimal foldings,
  3. reg (region) file - an output file that is more condensed than the ct file, but lacks the sequence information. It, too, can be used as the input to other programs.

              10         20          30      
----GGC  |       -  G  T       GTAA      AGC 
         CGGGCGCG GC GC CACGCCT      TCCC   A
         GTCCGCGC CG TG GTGCGGG      AGGG   C
AAAAAAA  ^       G  G  -       ----      CTT 
.      160       150        140         40   
 
                                                            50        60        
                                        -------------------C        A GA     -  
                                     GGC                    GAGGCGGG G  TTGCT TG
                                     CCG                    CTCTGCCC C  AGCGA AC
                                        ATTAAAAACATAAAAAACAT        - AG     T  
                                            130       120       110        100  
 
            70        80  
     .....-A       AGTTCG 
            GCCCAGG       
            CGGGTCC      A
     .....AG       GACCAG 
                90        
 
                  170 
         TAA      AGC 
              TCCC   T
              AGGG   A
         ---      CTC 
                      
 
            180       190       200         210       220       230     
              GGC         A GA      --GA       AGG     GC       A  -  T 
                 TGAGGCGGG G  TCGCTT    GCCCGGG   CGGAG  TGCAGTG GC CG G
                 ACTCTGCCC C  AGCGAG    CGGGTCC   GCCTC  ACGTCAC CG GC A
              ---         - AG      GCAG       --G     --       -  C  T 
                  280        270       260         250          240     


Figure 1: Printer output of the optimal folding of an Alu consensus sequence [29]. Alu sequences can occur within introns, and hence within RNA transcripts. The record length here is 80 columns. Note that the piece from nucleotides 68 to 97 has 2 sets of 5 dots preceding it. This piece is ``continued'' from the stem above it. The truncation occurs because the entire stem cannot fit into 80 columns. The symbols ``|'' and ``^'' point to the base pair C4-G160 that was selected. Base pairs are selected by the user from the energy dot plot in ``Sub-optimal plot'' mode, or automatically by the program in ``N-best'' mode, and an optimal structure containing that base pair is computed.


-------------- a --------------    ------------------- b -------------------
  290 ENERGY =  -114.5    ACJL     (    1)       4     160       8     -16.7
    1 G       0    2    0    1     (    2)      12     151       2      -3.4
    2 G       1    3    0    2     (    3)      15     148       2      -1.9
    3 C       2    4    0    3     (    4)      18     146       7     -13.7
    4 C       3    5  160    4     (    5)      29      44       4      -8.1
    5 G       4    6  159    5     (    6)      45     139       3      -6.3
    6 G       5    7  158    6     (    7)      49     116       8     -15.0
    7 G       6    8  157    7     (    8)      58     108       1       0.0
    8 C       7    9  156    8     (    9)      61     105       5      -7.1
    9 G       8   10  155    9     (   10)      66      99       2      -1.8
 ... (272 intervening lines) ...   (   11)      69      95       7     -15.6
  282 C     281  283  184  282     (   12)     164     179       4      -8.1
  283 A     282  284  183  283     (   13)     183     283       9     -16.8
  284 A     283  285    0  284     (   14)     193     274       1       0.0
  285 A     284  286    0  285     (   15)     196     271       6      -9.9
  286 A     285  287    0  286     (   16)     204     261       7     -15.1
  287 A     286  288    0  287     (   17)     214     253       5      -8.9
  288 A     287  289    0  288     (   18)     221     248       7     -12.6
  289 A     288  290    0  289     (   19)     229     241       2      -3.4
  290 A     289    0    0  290     (   20)     231     238       2      -2.0

Figure 2:Examples of ct and region files. a) Partial ct file of the optimal Alu folding presented in Figure 1. b) The entire region file for the same Alu folding. Note the 0 energies assigned to the single base ``helices'' (numbers 8 and 14). This is because energy assignment in helices is for the stacking of one base pair over another.

An example of printer output is given in Figure 1. Figure 2 contains examples of ct and region output. The first line of a ct file contains three items; the number of bases in the folded fragment, the folding energy, and the sequence label. Subsequent lines contain 5 fields:

  1. The base number, i, within the folded segment. This is called the internal base number.
  2. The base identity.
  3. The internal base number of the 5′ neighbor of base i (0 for the first base).
  4. The internal base number of the 3′ neighbor of base i (0 for the last base).
  5. The internal number of the base to which i is paired. This is 0 if base i is single-stranded.
  6. The historical numbering of base i within the folded sequence.
The region table format is more compact, giving only the helical regions without base identities. Each line contains 5 fields:
  1. The helix number.
  2. The historical numbering, i, of the 5′ external base of the helix.
  3. The historical numbering, j, of the 3′ external base of the helix.
  4. The number of base pairs, k, in the helix. Thus in the sequence r1 ,r2 , ... ,rn, the helix contains the k base pairs ri-rj , ri+1-rj-1 , ... , ri+k-1-rj-k+1.
  5. The energy of the helix in kcal/mol. This does not include single base stacking on the ends that might occur.

The user is prompted on whether or not printer output is desired (default is yes). If ``y'' is chosen, a further prompt asks if the output should go to the terminal (standard output, default is no). If ``y'' is chosen, the user is prompted for a file name. A default based on the sequence label is available. The next prompt is for the number of columns on the printing device or terminal. This is the ``record length'' of the output file. The default is 80, but other values can be selected.

In a similar way, the user is prompted for ct file and region file output, and for file names as required. In a regular run, program execution will continue with the main menu (3.10). In a continuation run in ``N best'' mode, there will be a pause, possibly a long one, while the program computes structures and writes them into the output files. The program will print out a message for every structure found, and then terminate. In a continuation run in ``Sub-optimal plot'' mode, program execution continues with the energy dot plot (3.11).

next up previous
Next: 3.10 Main menu Up: 3. Methods 1 Previous: 3.8 Energy file

Michael Zuker
Thu Nov 2 14:28:14 CST 1995