version 1.0


1. Introduction

    Program C2H2-enoLOGOS is an extension of program enoLOGOS which generates LOGOs of transcription factor DNA binding sites from various types of input matrices. C2H2-enoLOGOS generates LOGOs for the C2H2 zinc finger family of transcription factors.

    The C2H2 zinc finger motif (Pfam: PF00096) is the most commonly found DNA-binding motif in eukaryotes. C2H2 protein family consists of tens of thousands of members from various organisms as diverse as yeast and human. The co-crystal structure of the EGR1 protein, a member of this family, in complex with its preferred DNA target (Pavletich and Pabo, 1991), revealed a simple pattern of contacts, according to which each of the three fingers contacts its DNA target in a modular, anti-parallel fashion.

    C2H2-enoLOGOS generates an energy normalized LOGO using the matrix of predicted contact energies calculated by Benos et al., 2002. This matrix is based on data from DNA and protein selection experiments, SELEX and phage display, respectively. The user can either input directly the amino acid sequence for each of the fingers in each of the "contacting" amino acid positions (see above) or search the annotation database of the Pfam alignment for a C2H2 zinc finger proteins. Other parameters can be specified as in the standard enoLOGOS program.


2. Parameters

  1. Matrix input: The user can enter the weight matrix in horizontal or vertical format; i.e., the rows will correspond to the base type or the positions of the matrix, respectively. Lines that are preceded by "#" are considered comment lines and are ignored. A single matrix header line starting with "PO" can specify position labels (horizontal matrices) or base types (vertical matrices) of the logo columns. If a matrix header is found, then the first item on each subsequent line will be used as either the base type or position label of the horizontal or vertical matrix, respectively. Examples of horizontal and vertical matrices follow.

     

  2. Weight type: Weight values will always be energies for the C2H2_enoLOGOS system.

  3. Energy units: If the weight type is energies, then the energy units may need to be chosen.
    1. KbT: The default.
    2. K cals/mol
    3. K J/mol
    4. J/mol

  4. LOGO plot method: The user can select the method for calculation of the height of the symbol stacks. The two most popular are Shannon's entropy (also known as information content) and relative entropy (i.e., information content corrected for the background).
    1. relative entropy: H(i) as defined as above. Will generate Shannon's entropy when prior probabilities are equiprobable.
    2. frequency: Letter heights will be generated from their calculated probabilities (heights will sum to 1).
    3. weights as entered: When weight type is set to arbitrary, letter heights will reflect the input weights.

  5. Log base: The user specifies the preferred base for calculation of the logarithms in the final plot.

  6. Title (optional): The user specifies a title to be printed on the top of the plot.

  7. Axis labels (optional): The user specifies the labels for the x-axis and y-axis.

  8. Scale letters by probability: When "ON" (the default), each letter is scaled proportional to its probability where the total height of the column is the relative entropy H(i). When "OFF", letter heights will be proportional to the absolute value of relative entropy contribution for that letter. Note that the latter method will generate LOGOs where the bases with negative relative entropy are plotted upside-down.

  9. Wts (negate): If the energies are negative, then they may be negated with this option (all weights multiplied by -1) .

  10. Y-axis height: The user specifies the maximum height for the y-axis. This can be useful to users that want to print LOGOs of many patterns and want them to be on the same scale for comparison purposes. This value will be reset if the actual column heights exceed this value.

  11. X-axis, Y-axis: Control for turning ON and OFF the plotting of x- and y-axis.

  12. Mutual information: N/A. Mutual information cannot be calculated for weight matrix input.

  13. Aspect ratio: This option allows control over the LOGO column height-to-width aspect ratio. The default of 3, means that the height of the tallest column is 3 times the letter width. Typically this will need to be increased when the total number of positions exceeds 20 and decreased when umber of positions is less than about 6.

  14. Symbol colors: The user specifies the color of each symbol using the RGB system.

  15. Prior probabilities: The user specifies the reference probabilities for the four bases (e.g. for a 60% AT-rich organism, p(A)=p(T)=0.3 and p(C)=p(G)=0.2).

3. Reference

    Workman, Yin, Corcoran, Ideker, Stormo, Benos "EnoLOGOS: a versatile web tool for energy normalized sequence LOGOS." submitted.