Contigs are continuous genomic sequences from GenBank. We use contigs to identify the consensus genomic sequence for a UniGene cluster of Expressed Sequences (ESTs). Note that several clusters can map to a single contig. Also, GeneBank has changed its contig names since when we performed our calculation, and BLAST searches might be necessary to identify current contig identifiers.
A unique UniGene identifier for mRNA/EST sequences. from GeneBank. Also see gi_id, gb_id.
UniGene identifer for cDNA library preparation and tissue source.
Chromosome number (e.g. chromosome 21).
Gene to which radiation hyprid mapping position was assigned.
Gene symbol for this UniGene cluster (e.g. TCN1, DPMK).
NCBI identifier for the radiation hyprid mapping position.
Consensus sequence identifier for this UniGene cluster. We derive the multiple sequence consensus by aligning the cluster's EST/mRNA sequences to each other. If there are several such consensus sequences (identified by bundle_ids), we pick the one supported by most EST/mRNA sequences and give it a consensus_id. The consensus is later used to probe contigs to find the genomic sequence.
Identifier for a multiple sequence alignment used in consensus generation.
UniGene identifier for the subcloned vector.
Unique identifier for the sequence trace (chromatogram data).
A UniGene identifier for a set of clustered Expressed Sequences (EST/mRNA). The cluster identifiers may change as UniGene regroups its ESTs and new clusters are created. A single cluster is supposed to correspond to either a single gene or at least a part of a gene.
A unique identifier for splices detected by aligning all Expressed Sequences within a cluster to the genomic sequence. We detect splices as large gaps in Expressed Sequences produced by their alignment to the genomic sequence. A splice is defined by its starting and ending positions on the genomic sequence.
Stable GenBank identifier for the sequence. Also see gi_id and ug_id.
GenBank identifier for the current version of this sequence.Also see gb_id and ug_id.
Number of supporting EST observations for the splice_id.
Number of supporting mRNA observations for the splice_id.
See n_est
See n_est
Genomic location of the exonic nucleotide 5' of this splice. We detect splices as large gaps in Expressed Sequences produced by their alignment to the genomic sequence. Here, gen_start accounts for the nucleotide on the genomic sequence preceeding this gap. We use gen_start and gen_end to uniquely identify splices in a cluster_id.
Genomic location of the exonic nucleotide 3' of this splice. We detect splices as large gaps in Expressed Sequences produced by their alignment to the genomic sequence. Here, gen_end accounts for the nucleotide on the genomic sequence following this gap. We use gen_end and gen_start to uniquely identify splices in a cluster_id.
Date when the EST/mRNA sequence was added to a UniGene EST cluster.
Radiation hybrid mapping position.
To be defined.
Meaning depends on context:
Number of supporting EST observations for splice_id_1. Also see n_est.
Number of supporting mRNA observations for splice_id_1. Also see n_mrna.
Genomic position of the exonic nucleotide 5' of splice_id_1. Also see gen_start.
Genomic position of the exonic nucleotide 3' of splice_id_1. Also see gen_end.
Meaning depends on context:
Number of supporting EST observations for splice_id_2. Also see n_est.
Number of supporting mRNA observations for splice_id_2. Also see n_mrna.
Genomic position of the exonic nucleotide 5' of splice_id_2. Also see gen_start.
Genomic position of the exonic nucleotide 3' of splice_id_2. Also see gen_end.
Meaning depends on context. Possible options are:
Indicates how much evidence we have for the alternative splicing event. An entry of multiple evidence indicates that both splices have at least two ESTs or at least one mRNA observation. All other alternative splices are said to have single evidence.
Alternative splice event is novel if there are no mRNA sequences supporting it.
Shows genomic sequence at the 5' exon/intron boundary. Intronic sequence is given in lower-case, and exonic sequence is given in upper-case letters.
Shows genomic sequence at the 3' exon/intron boundary. Intronic sequence is given in lower-case, and exonic sequence is given in upper-case letters.
Splice alternative to this one. We detect 5' alternative splices, 3' alternative splices, exon skips and mutually exclusive splices
Meaning depends on context. Possible options are:
Information about the tissue donor
Tissue where the EST is expressed
Number of tissue-specific splices in the tissue
A unique identifier for the tissue category in our human tissue classification.
Tissue Specificity score. The higher it is higher the confidence score.
The number of ESTs which suport the splice form 1 within the tissue indicated by tissue_id.
The number of ESTs which suport the splice form 2 within the tissue indicated by tissue_id.
The number of ESTs which suport the splice form 1 within all tissues except the tissue indicated by tissue_id.
The number of ESTs which suport the splice form 2 within all tissues except the tissue indicated by tissue_id.
The robustness score to measure the stability of the TS-value with and without an EST observation of splice_id_1 within the tissue indicated by tissue_id.
The robustness score to measure the stability of the TS value with and without an EST observation of splice_id_2 within all the other tissues except the tissue indicated by tissue_id.
The confidence level of the tissue specificity measured by TS-value and robustness score. Higher confidence level indicates that the splice form is very likely to be preferentially found in the tissue indicated.
The tissue name which coresponds to the tissue_id.
title
Meaning depends on context. Possible options are:
Expressed Sequence (EST/mRNA) position of the exonic nucleotide 5' of the splice. Also see gen_start.
The number of EST/mRNA sequences present in a given cluster. It is well known that Human UniGene clusters have vastly different sizes: up to 65% of all UniGene clusters contain less then 5 EST/mRNA sequences, while 0.3% of all clusters contain 1028 sequences or more (refer to UniGene's statistics page).
The number of EST/mRNA sequences selected for the splice calculation. See include_cal for the selection criteria.
The ratio of num_seq_cal and num_seq. This is an overall figure representing the percentage of a cluster's EST/mRNA sequences that align well to the identified genomic sequence.
A field for tracking the status of consensus generation and its alignment to the genomic sequence. Status COMPLETE indicates that we have successfully completed both of these steps.
This number represents the contig position of the first nucleotide aligned to the consensus sequence. We use the contig start and end positions to obtain the genomic sequence for the UniGene Expressed Sequence Cluster.
This number represents the contig position of the last nucleotide aligned to the consensus sequence. We use the contig start and end positions to obtain the genomic sequence for the UniGene Expressed Sequence Cluster.
Orientation indicates which strand of the contig aligns to the consensus. Two possible values are 5'-> 3' or 3'-> 5'. We denote the positive strand with 5'-> 3', and the negative strand with 3'-> 5'.
See orientation.
Length of the EST/mRNA sequence aligned to the genomic sequence.
The length of insertion at the beginning (head) of the mRNA/EST relative to our genomic sequence. One possible reason for the insertion is vector sequence contamination.
The largest insertion relative to the genomic sequence in the inner (middle) region of EST/mRNA. All middle insertions larger then 6 nucleotides disqualify the EST/mRNA sequence from our splicing calculation.
The length of insertion at the end (tail) of the mRNA/EST relative to our genomic sequence. Many of the tail insertions are due to the poly-A tails.
The percent of nucleotides in the EST/mRNA aligned (match or mismatch) to the genomic sequence. If less then 70% of a consensus' nucleotides are aligned with the contig, we do not consider the EST/mRNA sequence in the splice calculation.
Indicates whether to include the EST/mRNA sequence in the splicing calculation. We do this to filter out the EST/mRNA sequences that might have been misclustered by UniGene . We consider that an alignment is "good" if middle is less then six nucleotides, and if percent_align is more then 70%.
Indicates if this splice has consensus GT/AG splice sites.
Meaning depends on context. Possible options are:
Percent of EST/mRNA aligned to the genomic sequence.
To be defined...
To be defined...
To be defined...
These are depricated fields and should not be considered.