This question got me thinking about amino acids and the ambiguity in the genetic code. With 4 nucleotides in RNA and 3 per codon, there are 64 codons. However, these 64 codons only code for 22 (including selenocysteine and pyrrolysine) amino acids, so many of the amino acids are coded by multiple codons.

Is there any hypothesis as to why there are only 22 amino acids and not 64? Is it possible that there were 64 (or at least more than 22) at an earlier time?

Brian Hayes wrote a very interesting article from a mathematical point of view:


especially the "Reality intrudes" section. Basically people had created fancy mathematical reasons why it has to be exactly 20. Nature, being nature, does not follow the reasoning, but has its own ideas. In other words there was nothing especially special about 20. In fact there seems to be a slow grafting of a 21st amino acid, selenocysteine using the codon UGA. Also pyrrolysine is considered the 22nd. The last section suggests that the code was originally doublet, so coded for <16 amino acids. This can partly explain why the third base in each codon is not as discriminating.

So perhaps in the year 2002012 someone will be asking on biology.stackexchange why there are only 40 amino acids.

