I'm working on a piece of software that computes melting temperatures for nucleic acid duplexes, and I'm about to add support for 5-methylcytosine as a nucleotide base. At the moment, the bases accepted by the program are adenine, cytosine, guanine, thymine and uracil, with their standard letters A,C,G,T,U. Is there a standard letter for 5-methylcytosine? The only example I've found is this paper, which uses an italic C for 5mc, but I want a distinct letter. The three candidates I've thought of are:
- 5 - for '5-methylcytosine'
- M - for 'methyl'
- B - Since methylation of U gives you T, which is one place earlier in the alphabet, so methylation of C gives B.
Well, don't use M or B, those are already taken (C or A, and not A, respectively). You can see the full list here: http://www.dna.affrc.go.jp/misc/MPsrch/InfoIUPAC.html (The enWiki article on Nucleobases lists a few others but I would ignore those as 1. D is present in both and 2. they are rare and inapplicable)
5-methylcytosine isn't on there. If you want to be pedantic about it, 5-methylcytosine is an epigenetic marker and as such is by definition not a genetic sequence; that remains simply a C and, genetically, the sequence is the same, despite the fact that it may indeed make a difference.
Most of the time people use m5C, so I'd go with 5 if I were you. That certainly isn't used for anything else and if you must use a single character most anybody will know what you are talking about.Tweet