The whole question is
What is the optimal frame size for the second and third generation protein secondary structure prediction methods? Justify your answer.
I remember it has something to do with the average length of alpha-helix. More specifically, 3 on both side of a site. So in total the frame length should be 7. But I can't remember the reason behind the argument.
What do you think?
According to what my professor said in class, 2nd and 3rd generation of protein secondary structure reconstruction relies on statistics data of several consecutive residues. I guess what he meant by "frame size", is how many adjacent residues we should take into account in the algorithm.
By frame size, do you mean sliding window?
I know that if you want to predict a secondary structure of a transmembrane protein, then your window size should be 20 amino acids (this is the average length of 1 transmembrane alpha helix spanning through the membrane).
It basically says that the window size depends on what kind of pattern you are looking for, but in general, 19 residues should be optimal.
Also, secondary structure predictors rely on many features like hydrophobicity, missing coordinates in X-ray structures, B-factors, motifs, etc.
- Chen K, Kurgan L, Ruan J. 2006. Optimization of the Sliding Window Size for Protein Structure Prediction. CIBCB '06: 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp.1-7, 28-29, doi:10.1109/CIBCB.2006.330959.