A structural bioinformatics investigation on protein–DNA complexes delineates their modes of interaction
The lifetimes of protein–DNA adducts are strictly related to the various protein functions. This feature must be encoded by the amino acids located at the protein–DNA interface. The large number of structurally characterized protein–DNA complexes now available from the Protein Data Bank (PDB) allows extensive structural bioinformatics investigations on protein–DNA interfaces. The modes of protein binding to DNA have been explored by dividing 629 non-redundant PDB files of protein–DNA complexes into separate classes for structural proteins, transcription factors and DNA-related enzymes. From the selected PDB structures, we could define 2953 protein–DNA contact regions. A systematic analysis of amino acid occurrences at these protein–DNA contact regions yielded composition profiles, which are typical for each of the three protein classes. The critical role of some amino acids to influence intermolecular contact lifetimes is discussed here. The occurrence of arginine at the protein–DNA interface, by far the most abundant amino acid in this protein moiety, is found to be the main feature that differentiates proteins from the three classes. Structural proteins and, to a lesser extent, transcription factors exhibit the highest Arg occurrence at protein–DNA contact regions. Reduced Arg/Lys ratios together with increased contents of Asp and Glu are observed in all the DNA-interacting enzymes. The amount of negatively charged side chains, highly conserved among homologous DNA-related enzymes at protein–DNA interfaces, is suggested as a tool to modulate protein mobility along DNA chains. Arg/Lys, Asp/Asn and Glu/Gln substitutions at protein–DNA interfaces may represent a very feasible way to control protein motion on DNA rails.