Frequently Asked Questions

Q: Which HIV consensus sequence do you use?
A: We use the Los Alamos consensus sequence for both protease and reverse transcriptase.

Q: Where can I find the Stanford datasets?
A: The Stanford sequences can be found by going to the following pages:
Q: What is Ka/Ks?
A: In evolutionary biology, one important tool for characterizing selection pressure is the ratio of observed amino acid mutations over observed synonymous mutations (nucleotide mutations that do not change the amino acid translation), often referred to as Ka/Ks (amino acid mutations over synonymous mutations) or dn/ds (nonsynonymous mutations over synonymous mutations). Since amino acid mutations, but not synonymous mutations, experience selection pressure due to their effect on protein function, their ratio gives a straightforward measure of this selection pressure. Throughout this site we will use the term Ka/Ks, which is normalized by the ratio expected under a random mutation model (i.e., in the absence of any selection pressure). A Ka/Ks value of 1 indicates neutral selection, i.e., the observed ratio of mutations that cause amino acid changes versus those that do not exactly matches the ratio expected under a random mutation model. Thus, amino acid changes are neither being selected for nor against. A Ka/Ks value of <1 indicates negative selection pressure. That is, most amino acid changes are deleterious and are selected against, producing an imbalance in the observed mutations that favors synonymous mutations. Much less common is positive selection (Ka/Ks > 1), indicating that amino acid changes are favored, i.e., they increase the organism's fitness. This unusual condition may reflect a change in the function of a gene or a change in environmental conditions that forces the organism to adapt. For example, HIV mutations which confer resistance to new antiviral drugs might be expected to undergo positive selection in a patient population treated with these drugs.

Q: How is the unconditional Ka/Ks calculated?
A: Amino acid selection pressure can be calculated for an individual codon; we refer to this as the "unconditional Ka/Ks" to distinguish it from the conditional Ka/Ks1. We first measured the transition and transversion frequencies ft and fv from the entire dataset, according to

f_t=N_t/n_t*S, f_v=N_v/n_v*s

where S is the total number of samples, Nt and Nv are the number of observed transition and transversion mutations respectively, nt is the number of possible transitions in the region that was sequenced (simply equal to its length L in nucleotides), and nv is the number of possible transversions.

We then performed the calculation of unconditional Ka/Ks as the ratio of Na, the count of samples with amino acid mutations observed at that codon, over Ns, the count of samples with synonymous mutations observed at that codon. This Na/Ns ratio is then normalized by the ratio expected under a random mutation model (i.e., in the absence of any selection pressure), according to the following formula:

Ka/Ks = (Na/Ns)/((n_{a,t}*f_t + n_{a,v}*f_v)/(n_{s,t}*f_t + n_{s,v}*f_v))

where na,t is the number of possible transition mutations in the codon that would change the wildtype amino acid, ns,t is the number of possible transition mutations in the codon that are synonymous, and na,v and ns,v are the equivalent numbers for transversions.

Q: What does the conditional selection pressure ratio (Ka/Ks)Y||X tell us?
A: It shows how mutations at an unconditional site X alters the selection pressure at another "conditional" site Y.

Q: How is the conditional selection pressure ratio (Ka/Ks)Y||X calculated?
A: To measure how mutation at site X alters the selection pressure at another site Y, we define the "conditional selection pressure" for a site Y in the presence of an amino acid mutation at another site X as:

(Ka/Ks)_{Y|Xa} = (N_{YaXa}/N_{YsXa})/((n_{a,t}*f_t + n_{a,v}*f_v)/(n_{s,t}*f_t + n_{s,v}*f_v))


where NYaXa is the number of samples with an amino acid mutation at codon Y and an amino acid mutation at codon X, and NYsXa is the number of samples with a synonymous mutation at codon Y and an amino acid mutation on codon X. We will refer to (Ka/Ks)Y|Xa as the "conditional Ka/Ks" at Y given an amino acid mutation at X.
We define the "conditional selection ratio" as the ratio of the conditional Ka/Ks divided by the selection pressure at Y measured in the absence of any mutation at X:

(Ka/Ks)_{Y||X} = (Ka/Ks)_{Y|Xa}/(Ka/Ks)_{Y|Xo} = (N_{YaXa}/N_{YsXa})/(N_{YaXo}/N_{YsXo})


where NYaXo and NYsXo are the numbers of samples containing either an amino acid mutation or synonymous mutation at Y and no mutation at codon X.

Publications

  1. Chen L, Perlina A, Lee CJ. Positive selection detection in 40,000 human immunodeficiency virus (HIV) type 1 sequences automatically identifies drug resistance and positive fitness mutations in HIV protease and reverse transcriptase. J Virol. 2004 Apr;78(7):3722-32.
  2. Chen L, Lee C. Distinguishing HIV-1 drug resistance, accessory, and viral fitness mutations using conditional selection pressure analysis of treated versus untreated patient samples. Biol Direct. 2006 May 31;1:14.
  3. Pan C, Kim J, Chen L, Wang Q, Lee C. The HIV positive selection mutation database. Nucleic Acids Res. 2007 Jan;35(Database issue):D371-5. Epub 2006 Nov 15.

Links