Protein Collapse is Encoded in the Folded State Architecture
Folded states of single domain globular proteins are compact with high packing density. The radius of gyration, Rg, of both the folded and unfolded states increase as Nv where N is the number of amino acids in the protein. The values of the Flory exponent ν are, respectively, ≈ 1/3 and ≈ 0.6 in the folded and unfolded states, coinciding with those for homopolymers . However, the extent of compaction of the unfolded state of a protein under low denaturant concentration (collapsibility), conditions favoring the formation of the folded state, is unknown. We develop a theory that uses the contact map of proteins as input to quantitatively assess collapsibility of proteins. Although collapsibility is universal, the propensity to be compact depends on the protein architecture. Application of the theory to over two thousand proteins shows that collapsibility depends not only on N but also on the contact map reflecting the native structure. A major prediction of the theory is that β-sheet proteins are far more collapsible than structures dominated by α-helices. The theory and the accompanying simulations, validating the theoretical predictions, resolve the differing conclusions reached using different experimental probes assessing the extent of compaction of proteins. By calculating the criterion for collapsibility as a function of protein length we provide quantitative insights into the reasons why single domain proteins are small and the physical reasons for the origin of multi-domain proteins. Collapsibility of non-coding RNA molecules is similar β-sheet proteins structures adding support to "Compactness Selection Hypothesis."