1. Introduction
Program C2H2-enoLOGOS is an extension of program
enoLOGOS which generates LOGOs of
transcription factor DNA binding sites from various types of input matrices.
C2H2-enoLOGOS generates LOGOs for the C2H2 zinc finger family of
transcription factors.
The C2H2 zinc finger motif (Pfam:
PF00096)
is the most commonly found DNA-binding motif in eukaryotes. C2H2 protein family consists of
tens of thousands of members from various organisms as diverse as yeast and human.
The co-crystal structure of the EGR1 protein, a member of this family, in complex with its
preferred DNA target
(Pavletich and Pabo, 1991),
revealed a simple pattern of contacts, according to which each of the three fingers contacts
its DNA target in a modular, anti-parallel fashion.
C2H2-enoLOGOS generates an energy normalized LOGO
using the matrix of predicted contact energies calculated by
Benos et al., 2002. This matrix is based on data from DNA and protein selection
experiments, SELEX and phage display, respectively. The user can either input directly the
amino acid sequence for each of the fingers in each of the "contacting" amino acid positions
(see above) or search the annotation database of the Pfam alignment for a C2H2 zinc finger
proteins. Other parameters can be specified as in the standard enoLOGOS program.
- Matrix input:
The user can enter the weight matrix in horizontal or vertical
format; i.e., the rows will correspond to the base type or the
positions of the matrix, respectively.
Lines that are preceded by "#" are considered comment lines
and are ignored. A single matrix header line starting with "PO"
can specify position labels (horizontal matrices) or base types
(vertical matrices) of the logo columns. If a matrix header is
found, then the first item on each subsequent line will be used
as either the base type or position label of the horizontal or
vertical matrix, respectively. Examples of horizontal and vertical
matrices follow.
- Weight type:
Weight values will always be energies for the C2H2_enoLOGOS system.
- Energy units:
If the weight type is energies, then the energy units may need to be chosen.
- KbT: The default.
- K cals/mol
- K J/mol
- J/mol
- LOGO plot method:
The user can select the method for calculation of the height of
the symbol stacks. The two most popular are Shannon's entropy
(also known as information content) and relative entropy
(i.e., information content corrected for the background).
- relative entropy: H(i) as defined as above. Will generate Shannon's entropy
when prior probabilities are equiprobable.
- frequency: Letter heights will be generated from their calculated
probabilities (heights will sum to 1).
- weights as entered: When weight type is set to arbitrary,
letter heights will reflect the input weights.
- Log base:
The user specifies the preferred base for calculation of the
logarithms in the final plot.
- Title (optional):
The user specifies a title to be printed on the top of the plot.
- Axis labels (optional):
The user specifies the labels for the x-axis and y-axis.
- Scale letters by probability:
When "ON" (the default), each letter is scaled proportional to its probability where the
total height of the column is the relative entropy H(i). When "OFF",
letter heights will be proportional to the absolute value of relative entropy
contribution for that letter. Note that the latter method will generate LOGOs
where the bases with negative relative entropy are plotted upside-down.
- Wts (negate):
If the energies are negative, then they may be negated with this option
(all weights multiplied by -1) .
- Y-axis height:
The user specifies the maximum height for the y-axis. This can
be useful to users that want to print LOGOs of many patterns and
want them to be on the same scale for comparison purposes.
This value will be reset if the actual column heights exceed this value.
- X-axis, Y-axis:
Control for turning ON and OFF the plotting of x- and y-axis.
- Mutual information:
N/A. Mutual information cannot be calculated for weight matrix input.
- Aspect ratio:
This option allows control over the LOGO column height-to-width aspect ratio.
The default of 3, means that the height of the tallest column is 3 times the
letter width. Typically this will need to be increased when the total number
of positions exceeds 20 and decreased when umber of positions is less than about 6.
- Symbol colors:
The user specifies the color of each symbol using the RGB system.
- Prior probabilities:
The user specifies the reference probabilities for the four bases
(e.g. for a 60% AT-rich organism, p(A)=p(T)=0.3 and p(C)=p(G)=0.2).
3. Reference
Workman, Yin, Corcoran, Ideker, Stormo, Benos
"EnoLOGOS: a versatile web tool for energy normalized sequence LOGOS."
submitted.
|