************ * POAVIZ * ************ Release notes, Nov. 2003. Available at http://www.bioinformatics.ucla.edu/poa (the POA website). I. INTRODUCTION ------------------ POAVIZ is a program for the visualization of multiple sequence alignments (MSAs). The program reads an MSA file in CLUSTAL, PIR, or PO format as input. It represents the MSA as a partially- ordered graph, lays out the graph in the XY-plane, and writes the layout and connectivity information of the graph to an output file. POAVIZ can be run from the command line, and generates a text description of a visualization rather than an actual image. An HTML/CGI interface allowing the generation of GIF images is also available; this is called POAVIZ Online, and it can also be downloaded from the POA website. II. COMPILING & RUNNING POAVIZ --------------------------------- Unpacking the tar file poaviz_pkg.tar will produce a directory poaviz_src/ containing the source files for POAVIZ and a copy of this file (the POAVIZ readme). To build the POAVIZ binary: cd poaviz_src/ make poaviz This produces an executable binary (poaviz); the software has been tested on Linux and Mac OS X. Usage: poaviz <-input FILE> [OPTIONS] POAVIZ accepts a number of command-line flags, allowing the user to specify input/output files, limit the content of the visualization, and affect the output style. A description of each flag is given below. *** INPUT/OUTPUT FILES *** -input Read input from FILE (required). POAVIZ will try to determine the input format by inspecting the filename suffix and the first line of the input file. The first line should begin with "VERSION=" for PO files, ">" for PIR files, and "CLUSTAL" for CLUSTAL files. However, the first line may be omitted in the case of CLUSTAL files. -output Write output to FILE. If this flag is not given, output is written to 'viz.dat'. The output format is described later in this readme. *** RESTRICTING OUTPUT CONTENT *** -start_node -end_node Do not lay out or visualize nodes with indices < M or > N. Both flags are optional. -show_consens Show only consensus sequences from MSA. POAVIZ interprets sequences whose names begin with 'CONSENS' as consensus sequences. -seqs_order -show_seqs Show only the sequences listed (n1, n2, n3...) or specified in FILE, in the order given. These options are mutually exclusive. -show_bundle Show only sequences with bundle ID = N. Bundle IDs are only stored in PO-formatted files, so this flag is ignored for non-PO input. *** OUTPUT STYLE *** -smooth Smooth the visualization, showing only features with more than N residues. This flag is useful for hiding single-residue differences between aligned sequences (e.g. sequencing errors or point insertions) while retaining larger-scale alignment structure. -hier_order Visually order sequences on the basis of sequence similarity. This is done by calculating the percentage of aligned residues between each pair of sequences, and hierarchically clustering the sequences on the basis of this score. -log_length Show N-residue alignment blocks with length log(1+N). This is useful when some blocks (e.g. of intronic sequence) are extremely long compared to other blocks. -main_line Force sequence N to be horizontal in the graph layout. -simple Do not optimize graph layout. The graph produced will look like the row-column formatted input, if the input is PIR or CLUSTAL. -quiet Operate quietly: only report critical errors. By default, POAVIZ provides a running commentary on its operation to standard error. III. OUTPUT FILE FORMAT -------------------------- The output of POAVIZ is a text file describing the graph and its layout. The graph consists of nodes and directed edges connecting these nodes. Each node contains a particular set of sequences, and each edge is associated with a single sequence. The output file has three parts: *** SEQUENCE DATA *** { seq_ID, "sequence name", "sequence title" } ... *** NODE DATA (node_IDs are 0, 1, 2, etc.) *** { x, y, length, { seq_ID_1, seq_ID_2, ... } } ... *** EDGE DATA *** { node_ID_from, node_ID_to, seq_ID } ... The corresponding image should have a rectangle centered at (x,y) for each node, which extends horizontally from x - length/2 to x + length/2. Each edge connects the right side of node_ID_from to the left side of node_ID_to. If each sequence is assigned a color, then the edges and nodes can be colored to indicate their sequence content. POAVIZ Online, which you may use or download at the POA website, takes the output of POAVIZ and generates a GIF image in this way. A sample input file, output file, and GIF image are provided in the source directory (multidom.pir, multidom_viz.dat, and multidom_viz.gif). The output file can be generated with poaviz -input multidom.pir -smooth 3 -output tst.dat The files multidom_viz.dat and tst.dat will be identical if your version of POAVIZ is working correctly. . LEGAL STUFF --------------- Copyright 2001-2003 to Christopher J. Lee. Property of the Regents of the University of California. This is free software, distributed under the GNU General Public License, as is, with no warranty or guarantee of fitness for any particular purpose. You use it at your own risk, with the understanding that neither the author, nor any other person or institution, can be held responsible for any damage this software might cause.