| 1. Overview of ASAP II Database [Back, Top] |
|
We have updated our ASAP
database into ASAP II with new interface and integration of comparative
features using UCSC BLASTZ multiple alignments. ASAP II supports 9
vertebrate species, 4 insects, and nematodes, and provides with extensive
alternative splicing analysis and their splicing variants. As for human,
newly added EST libraries were classified and included into previous tissue
and cancer classification, and lists of tissue & cancer (normal) specific
alternatively spliced genes are re-calculated and updated. We have created a
novel orthologous exon & intron databases and their splice variants based on
multiple alignment among several species. These orthologous exon & intron
database can give more comprehensive homologous gene information than
protein similarity based method. Furthermore, splice junction and exon
identity among species can be valuable resources to elucidate
species-specific genes. ASAP II database can be easily integrated with pygr
(unpublished, the Python Graph Database Framework for Bioinformatics) and
its powerful features such as graph query, multi-genome alignment query and
etc. ASAP II is available at
http://www.bioinformatics.ucla.edu/ASAP2. Web Interface ASAP II can be searched by several different criteria such as gene symbol, gene name and ID [UniGene, GenBank etc.]. The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons & orthologous exons; (IV) introns & orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue & cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser (See A in following figure). Users can easily navigate among all the views by clicking links of interest. Alternative and constitutive exons are highlighted in red and blue, respectively. All alternative splicing relationships with supporting evidence information (See B in following figure), types of alternative splicing patterns (See C&D in following figure), and inclusion rate for skipped exons (See E in following figure) are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page (See F in following figure). We report p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences. See User's Guide for more comprehensive information.
Major output page for ASAP II. (A) Part of ASAP II genome browser showing isoform, exon and intron alignments. Alternative and constitutive exons & introns are denoted in red and blue, respectively. (B) List of all alternative splicing relationships. (C) Alternative donor and acceptor events. (D) Exon skipping events. (E) Exon inclusion rate for skipped exon. (F) Human EST library classification, tissue and normal vs. cancer specificity. High confident tissue-specific and cancer-specific alternative splicing relationships (LOD >= 3, at least 3 EST evidences) will be highlighted in red. Statistics for ASAP II
* Genome Assembly sequences were
downloaded from UCSC genome browser except for Yellow fever mosquito, which
was downloaded from
Enesmbl genome browser 53 % of human and mouse multi-exon genes are detected to contain alternative splicing. Focusing on genes with at least mRNA, 75 % and 60 % of human and mouse multi-exons genes were detected to contain alternative splicing (see ASAP II website for details). Due to limited mRNA and EST coverage (Fugu and honeybee) and incomplete genome assembly (Fugu, Ciona, and yellow fever mosquito), number of mapped clusters (Ciona, 9 %; 1373 out of 15587) or alternatively spliced clusers (24 for Fugu, 98 for Ciona, 57 for honeybee, 87 for and yellow fever mosquito) can be significantly lower than expected, these data cannot be considered comprehensive. 19 ~ 26 % of fruit fly, western clawed frog, chicken, rat, and cow multi-exon genes were detected to contain alternative splicing. Click here to download full statistics for ASAP II
UniGene release date and
Statistics for Alternative Splicing Events
All UniGene database was download in January 2006. Due to UniGene updating frequency, release date would be older than January 2006. Current release of ASAP II is version JAN06. We will update ASAP II database if total number of available sequences is significantly increased. Right side of table shows statistics for alternative splicing events. Exon skipping events are most common events. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2. Detecting Orthologous Exons and Introns using MULTIZ multiple alignments [Back, Top] |
|
Comparative genomics is a major
focus for the ASAP II database, diaplaying results from its new orthologous
exons and introns database. We can download MULTIZ
multiple alignments in
UCSC genome browser
for human (hg17), mouse (mm7), chicken (galGal2), fruit fly (dm2), zebrafish
(danRer3), and western clawed frog (xenTro1). Orthologous Exons and Introns
were defined by sharing at least one splice site in multigenome alignments.
This strategy can increase the possibilities of finding orthologous exons,
because the exons can be within well-conserved blocks of multigenome
alignments. Conventioinal protein similarity based method can give only
orthologous exons only if protein sequences are available. Moreover,
multigenome alignment based method enables us to interpret how alternatively
spliced exons and introns evolved across distance species.
Above figure shows segment of multigenome alignment. As you can see, human exon is well conserved for mouse, rat, rabbit, dog, elephant and opossum. And, splice site consensus is all canonical, AC/CT (GT...AG if reverse complement). By comparing all splice sites within ASAP II database, we constructed ortholous exons and introns database. Statistics for Orthologous Exons and Introns
*
Only orthologous exons and introns that have two exact matches of both
canonical splice sites (U1/U2 and U11/U12). 66 % (85673 out of 129981) for human and 77 % (81296 out of 105260) for mouse internal exons have at least one orthologous exons. 52 % (100447 out of 193024) for human and 69 % (97371 out of 141285) for mouse canonical introns have at least one orthologous introns. Most of orthologous exons and introns were from human and mouse orthologs due to larger number of mRNA and EST sequences than other genomes. 56 %, 37 %, and 47 % of chicken (galGal2), zebrafish (danRer3), and western clawed frog (xenTro1) internal exons have at least one orthologous exons and 49 %, 33 %, and 41 % for orthologous introns. Because a set of genome assemblies used for multigenome alignments is different from ASAP II calculation for chicken, zebrafish and western clawed frog (see Table 1 for details), numbers of orthologous exons and intron cans be decreased. |
| 3. Splice Junctions from MULTIZ multiple alignments and Finding Lineage-Specific Genes [Back, Top] |
|
Another major feature of ASAP II
is a multigenome splice site database. Actually, it is a multiple alignment
of splice sites of introns as shown in figure below. One can easily see that
this pair of splice sites appears to have evolved in an early mammalian
ancestor, but not before. For example, researchers could identify "recently
evolved splice sites" by selecting introns whose canonical splice site
sequences (GT/AG) are only conserved within closely related species, but not
in distance species.
Multiple alignment of splice sites is extraced using Pygr and its phylogenetic tree is generated by UCSC Phylogenetic Tree GifMaker on the fly. |
| 4. Human EST Library Classification and Tissue/Cancer/Normal Specific Alternative Splicing [Back, Top] |
|
In order to update lists of tissue and cancer
vs. normal specific genes for human, we downloaded EST library
information from UniLib (ftp:/ftp.ncbi.nih.gov/repository/UniLib/). 2895 new
human EST libraries were classified and added into existing 47 tissue
categories and normal/tumor types. In total, 8828 human EST libraries were
classified into 47 tissues and normal/tumor. We used
same method used by Xu et al. for LOD value
calculation for tissue and normal vs. cancer specificity. We found 1709 high-confidence (LOD >= 3) tissue-specific alternative splicing relationships from 960 genes (Click Here to download all high confident tissue-specific relationships), and 273 high-confidence (LOD >= 3) cancer-specific relationships from 198 genes (Click Here to download all high confident cancer-specific relationships). The largest categories of tissue-specific splice forms were identified from brain/nerve, testis, skin, muscle, and lymph (Click Here to download statistics for tissue-specific genes). Users can download all EST library classification and LOD calculation results from ASAP II download page and mine their own experimental candidates. After one uploads MySQL tables in download page, all high confident alternative relationships can be retrieved by following SQL syntax. mysql> select * from
LOD_Tissue_hg17 where LOD >= 3 and n_s1_tissue >= 3 order by cluster_id; |
| 5. Integration with pygr graph query module and NLMSA comparative genomics tool [Back, Top] |
|
Pygr (the Python Graph Database
Framework for Bioinformatics,
http://www.bioinformatics.ucla.edu/pygr) has power features such
as graph query, multigenome alignment query and etc. ASAP II database can be
easily integrated with Pygr. Orthologous exons and introns, multigenome
splice site database are constructed using Pygr. 1. How to upload ASAP II into your own database After creating your own database, e.g. SPLICE_JAN06, you can upload all ASAP II database into your database server. mysql> create database SPLICE_JAN06; Download MySQL file in download page. And, uncompress using "gzip -d" command. You can upload using "mysql" command. $ mysql SPLICE_JAN06 < alt_splice_hg17.sql You can check uploaded database using SQL syntax. mysql> desc alt_splice_hg17; In order to analyze ASAP II database using Pygr, you have to upload all tables in ALTSPLICE and ISOFORM at download page. Suffix of each table is genome assembly name, e.g. *_hg17. 2. How to install Pygr and Tutorials Prerequisites: Python 2.2 or higher (Python 2.3 or higher recommended), Pyrex, MySQL client & Python MySQLdb module You can download Pygr at http://sourceforge.net/projects/pygr/, Pygr can be installed by standard python installation method. (For distribution package, we already generate C files from Pyrex *.pyx. You don't need Pyrex unless you want to do change *.pyx files). Connection to MySQL database is essential, thus you need both MySQL client and MySQLdb python module to work with ASAP II database. $ python setup.py install Check whether Pygr is installed correctly. $ python Pygr was presented in a software demo at ISMB2005 and ISMB2006. For Pygr 0.5, see this presentation for details. 3. Graph Query Examples After you uncompress Pygr, you can see bunch of test modules in tests directory. Test module is based on ASAP or ASAP II database (with *_jan06.py). For Pygr graph query test, test_jan06.py is successfully tested using ASAP II database. You may need to change database name and table name in lldb_jan06.py. For human, it would take about 40 minutes (and about 1GB memory) to upload all alternative splicing relationships in ASAP II database. Results of test_jan06.py is test_jan06.log. Compare log file with your own results. 4. Multigenome Alignment Query using Pygr NLMSA First thing to do is to download multigenome alignment at UCSC genome browser. Many multigenome alignments are available, but you have to download same version with ASAP II database (see Statistics table above). After you finish downloading and uncompress in your directory, you can generate Pygr NLMSA using createdb.py script. You may have to change chromosome names in the script. In Pygr tests directory, you can find alt35_jan06.py. This test script is to extract splice sites from multigenome alignments. The output of that script is as follows. 1 Hs.99886 ss1 panTro1 100.0 100.0
GT GT The python script, alt35_jan06.py is only for ASAP II database. If you want general Pygr NLMSA feature, multigenome alignment query, see ISMB2006 presentation PDF for details. |