Searching the Porphyromonas gingivalis genome with peptide fragmentation mass spectra

Weibin Chena, Keith E. Laidiga, Yoonsuk Parkb, Kyewhan Parkb, John R. Yates IIIc, Richard J. Lamontb and Murray Hackett*a
aDepartment of Medicinal Chemistry, University of Washington, Box 357610, Seattle, WA 98195, USA.. E-mail: mhackett@u.washington.edu; Fax:; Fax: +1 206 685-3252
bDepartment of Oral Biology, University of Washington, Seattle, WA 98195, USA
cScripps Research Institute, Department of Cell Biology, La Jolla, CA 92037, USA

Received 1st September 2000, Accepted 30th October 2000

First published on 7th December 2000


Abstract

An approach is described for genomic database searching based on experimentally observed proteolytic fragments, e.g., isolated from 1D or 2D gels or analyzed directly, that can be applied to unfinished prokaryotic genomic data in the absence of annotations or previously assigned open reading frames (ORFs). This variation on the database search is in contrast to the more familiar use of peptide mass spectral fragmentation data to search fully annotated inferred protein databases, e.g., OWL or SWISS-PROT. We compared the SEQUEST search results from a six reading frame translation of the Porphyromonas gingivalis genome DNA sequence with those from computationally derived ORFs created using publicly available genomics software tools. The ORF approach eliminated many of the artifacts present in output from the six reading frame search. The method was applied to uninterpreted tandem mass spectrometric data derived from proteins secreted by the periodontal pathogen Porphyromonas gingivalis in response to the gingival epithelial cell environment, a model system for the study of host–pathogen interactions relevant to human periodontal disease.


Introduction

The use of databases of known expressed or inferred protein sequences to assist in the identification of unknown proteins analyzed using peptide collision-induced dissociation (CID) data is a well-established practice. This is, in essence, the use of model data (the database) to elucidate the experimental observation (the peptide CID, that intrinsically contains partial amino acid sequence information, that in turn can be related back to a specific gene via the database). Two recent reviews summarize current progress in proteomics research as it relates to microbiology1 and genetics.2 We present here an approach by which the experimental observation is used to test and elucidate the model data found within new or unfinished prokaryotic genomic databases, which consist of one or more long DNA contigs rather than entries for individual proteins translated from the gene sequence. The object of the exercise was to utilize fully new genomic data that are not normally available in an annotated protein database format, often for months or years after the actual DNA sequence itself is complete. The mass spectrometry based ‘reverse genomics’ approach presented here is complementary to cDNA array technology in that it represents a view of the bacterial genome that is firmly based on experimentally observed protein expression. At the present time, protein mass spectrometry and cDNA arrays are the most promising approaches for linking the worlds of conventional hypothesis driven pathogenesis research and functional genomics.

Mass spectrometry software programs for protein analysis are geared towards searching large, fully annotated protein databases, e.g., OWL,3 SWISS-PROT4 and others. Our approach allows database searching strategies, based on the SEQUEST5,6 program, to be applied to raw prokaryotic genomic DNA data in the absence of any prior knowledge regarding protein expression. While the data reduction and analysis approach presented here is hardly elegant, it has provided our laboratory with an interim solution pending the availability of the genome data in an annotated format that is more compatible with the current generation of protein mass spectrometry software programs. The absence of extended sequences of non-coding DNA, relative to the situation in higher organisms, coupled with the small sizes of the genomes (e.g., 2.2 million base pairs (MBP) for Porphyromonas gingivalis), makes our approach reasonable with modest computational resources.

P. gingivalis is a Gram-negative anaerobic coccobacillus that is an etiological agent of severe adult periodontitis. P. gingivalis possesses a number of virulence factors including the ability to invade the epithelial cells of the gingiva. In primary cultures of gingival epithelial cells the internal bacteria rapidly locate in the cytoplasm, predominantly in the perinuclear area, where they can replicate and reach a high density.7 The molecules of P. gingivalis that direct these events have yet to be determined. In most invasive Gram-negative bacteria, the type III protein secretion machinery mediates secretion of bacterial molecules into the cytoplasm of host cells. The targets of the type III secretion apparatus often have the ability to subvert information flow within the host cells and facilitate bacterial entry.8 Similarly, P. gingivalis secretes a novel set of proteins when in contact with epithelial cells which may have intracellular effector activity.9 However, the genomic sequence for P. gingivalis reveals that the organism does not have the genes for a conventional type III protein secretion apparatus. In order to investigate the nature and role of the proteins secreted by P. gingivalis in response to epithelial cells and the process by which they are secreted, it is first necessary to identify the secreted proteins. We acquired tryptic peptide CID data from a differential proteomics experiment using proteins secreted by P. gingivalis grown under normal laboratory conditions and after exposure to conditioned growth medium from human gingival epithelial cells (GECs). GECs are a model system used to study host–pathogen interactions involved in human periodontal disease. The data set from these experiments was used both to validate the analytical strategy and to generate biologically relevant information about P. gingivalis and its interactions with GECs. The data reduction and analysis strategy that has evolved from this work consists of using a computational open reading frame (ORF) finding tool and a locally hosted BLAST server to augment our existing mass spectrometry and protein database search capabilities.

Experimental

Bacterial growth conditions, protein extraction, SDS-PAGE

P. gingivalis 33277 was grown anaerobically at 37 °C in Trypticase soy broth supplemented per liter with 1 g of yeast extract, 5 mg of hemin and 1 mg of menadione. Proteins secreted by P. gingivalis were collected as described by Park and Lamont.9 Briefly, P. gingivalis (PG) cells washed with phosphate-buffered saline (PBS) were resuspended in conditioned keratinocyte basal medium (KBM) (culture supernatant of gingival epithelial cells) or PBS and incubated for 6 h at 37 °C. Proteins in the cell-free medium were obtained by precipitation with 10% trichloroacetic acid (TCA) as described previously.9 Three gel bands containing the proteins of interest were excised from Coomassie Brilliant Blue-stained gels after SDS-PAGE and digested in situ with trypsin.10 For our purposes, a protein was of interest if it was observed as a gel band after exposure to the conditioned GEC growth medium described above, but not observed in the PG containing control buffer. A blank control was also run consisting of a gel containing only laboratory background without PG protein.

Protein mass spectrometry

All CID (MS2) data were collected in an automated, data dependent manner11 using a Finnigan (San Jose, CA, USA) TSQ 7000 mass spectrometer coupled with a microcapillary HPLC inlet system12 and a modified electrospray ionization interface13 that have been described previously. Briefly, Magic C18 stationary phase material (Michrom BioResources, Auburn, CA, USA) was packed into a 12 cm × 75 μm id fused-silica capillary column. Approximately 1 μl of digest from each gel spot was loaded pneumatically.14,15 The columns were eluted with a 45 min linear gradient of 2–95% acetonitrile in water (0.4% v/v acetic acid) at a flow rate of 250 nl min−1, as measured at the beginning of the gradient. A script written in Instrument Control Language (ICL) (Finnigan) instructed the TSQ to collect a scan of centroid mode main beam (MS1) data over the range m/z 200–2000 every 1.5 s, until a signal was detected above a threshold value of 40000 counts with a signal-to-noise ratio (S/N) >5. Once the main beam signal exceeded the threshold, the instrument acquired several scans of CID product ion mass spectra (MS2) while invoking a subroutine to optimize the collision offset, before automatically switching back to the MS1 mode. This process was repeated for the entire HPLC run. A constant pressure of 3.0 mTorr of argon was maintained in the octapole collision cell at all times. The finished data files from each gel band typically consisted of roughly 50–200 CID spectra per run, which were acquired and stored on a DEC AlphaStation Model 200 4/166 computer (Compaq/DEC, Houston, TX, USA). The hard drive containing the raw data in Finnigan ICIS format was cross-mounted via an intranet connection and the Unix Network File System (NFS) to both a second DEC AlphaStation 200 and a much faster Compaq/DEC AlphaStation 500. The first AlphaStation was dedicated to real time control of the mass spectrometer only and was not used for post-run computations. Subsequent post-run analyses and database searching queues were initiated from either the second AlphaStation 200 or the AlphaStation 500.

Computational hardware, software and procedures

Our approach to the genome-as-database problem was first to pre-compile on disk our own six reading frame translation in FASTA format, based on a PERL script (PERL version 5.005_03, www.perl.com) provided by the UW Genome Center. Second, we investigated the ORFIND program (author: T. Tatusov) from the National Center for Biotechnology Information (NCBI, a division of the National Institutes of Health, Bethesda, MD, USA) for purposes of creating a second FASTA database of putative ORFs. ORFIND source code (C language) was compiled locally to run under Digital Unix. The SEQUEST program itself will make the six reading frame translation of the DNA ‘on the fly.’ However, we preferred to pre-compile our databases for reasons of speed, to insure the use of a translation table optimized specifically for bacteria (NCBI Table 11, NIH, Bethesda, MD, USA), and to have a permanent record of the theoretical translation. The six reading frame search output from SEQUEST was itself searched against the PG ORF database using BLAST 2.1 (see below), as shown schematically in Fig. 1. The putative ORF database was also searched by SEQUEST directly (see Fig. 2). CIDs with high scoring matches common to both databases were inspected manually to verify the peptide sequence in the SEQUEST output. High scoring matches against the six reading frame database not found in the ORF database were also inspected manually. All searches were run on an AlphaStation 500 running Digital Unix 4.0D, the APACHE web server (www.apache.com) and SEQUEST (Unix version 27, University of Washington, Seattle, WA, USA). General protein database searches were conducted using OWL version 31.4 (www.biochem.ucl.ac.uk/bsm/dbbrowser/OWL/OWL.html).
Flow chart describing the order of events when searching the six reading 
frame translation of the PG genome with peptide tandem mass spectrometric 
data. Early in our studies the ORF database was used purely to verify 
putative amino acid sequences already assigned by SEQUEST, and it was not 
searched directly.
Fig. 1 Flow chart describing the order of events when searching the six reading frame translation of the PG genome with peptide tandem mass spectrometric data. Early in our studies the ORF database was used purely to verify putative amino acid sequences already assigned by SEQUEST, and it was not searched directly.

Flow chart describing the order of events when SEQUEST was used to 
search the ORF database directly. Prior to the ORF search the CID data are 
normally searched against a large protein database (e.g., OWL) to 
identify any non-PG proteins that are present; see the text for further 
discussion.
Fig. 2 Flow chart describing the order of events when SEQUEST was used to search the ORF database directly. Prior to the ORF search the CID data are normally searched against a large protein database (e.g., OWL) to identify any non-PG proteins that are present; see the text for further discussion.

The six reading frame and ORF PG databases were based on the May 17, 1999, release of the PG genome (www.tigr.org). SEQUEST runs were controlled using the DQS queuing system version 3.2.7 (Florida State University, Tallahassee, FL, USA). The search results were fed into HTML based data summary tools and presented using standard HTML browsers. Subsequent searches, conducted using a variant of the Basic Local Alignment Search Tool (BLAST), TBLASTN, were used to confirm the locations of the codons in the PG database corresponding to the experimentally observed peptides common to both the raw and putative ORF databases. These searches were run locally on our AlphaStation 500 using the stand alone web-based BLAST server (NCBI). For short peptides five amino acid residues in length (e.g., DLLFK, ESLTK; see Table 1) that failed to work with BLAST, we used a web browser (Netscape) based text editor to locate the fragments in the original PG DNA database or the putative ORF protein database.

Table 1 SEQUEST search results for tryptic fragments with amino acid sequences that could be derived from both the six reading frame and the ORF databases generated from the P. gingivalis genome. Each sample represents a band excised from an SDS-PAGE gel
Sample nameLocationaAmino acid sequences for matching peptidesHomologyb
a Base pair location from genome sequence in the TIGR database (www.tigr.org).b Determined by translating the ORF containing the peptide sequence and BLAST searching for homology in the GenBank database.c Low homology score, E value >0.1 (www.ncbi.nlm.nih.gov/BLAST/).
Band 1917688–917717QAIVYWKTLKHipA protein—E. coli
1720447–1720418SDELRLMIHRDNA damage-inducible protein F—Vibrio cholerae
1987395–1987427QSSKEHIPSNKAcrosomal protein ACR55—Homo sapiensc
Band 2155209–155177HNRGFLTPELKLipopolysaccharide biosynthesis protein—Thermotoga maritima
705350–705321DLLFKProbable phosphoserine phosphatase—Streptomyces coelicolor
1539806–1539780DSPVCEAIPKHypothetical protein A—Bacillus stearothermophilusc
2118271–2118242GAAPINHAIRChain A, methionyl-trnafmet formyltransferase complexed with formyl-methionyl-trnafmet—E. coli
Band 31853282–1853308ESAPRSFEKRTN2-C—Homo sapiensc
573866–573895KALGYLLSERAmidophosphoribosyltransferase—Pasteurella multocida
1369783–1369754KNGENLLLIKHypothetical protein RP819—Rickettsia prowazekii
2047261–2047275ESLTKLycopene β-cyclase—Citrus sinensisc


Results and discussion

Initial observations and OWL database search results

A representative data-dependent capillary LC-MS-MS experiment is shown in Fig. 3. The majority of the peptides observed were from a protein contaminant of human origin, siderophilin (GenBank accession No. P02787), as identified in the OWL database, that was present in all three gel bands at much higher abundance than the proteins secreted by PG. We believe the siderophilin originated with the cell culture medium. However, the experiment is robust enough that even this relatively high contamination level did not significantly impede the identification of the PG proteins. Signals from the PG related tryptic fragments were weak, but adequate to generate at least one high quality match per putative protein. As has been noted previously in the literature with respect to inferred protein databases,5,6 only a small amount of protein sequence coverage is necessary to relate it back to the database. The OWL or other general protein database search is an important tool even in experiments targeted towards a single genome because of the ease with which peptide fragments from common background sources, e.g., human keratin or trypsin autolytic fragments, are quickly eliminated from consideration as useful data. However, on our equipment a search of several hundred CID mass spectra against the entire OWL database may take 4 h or longer. A search of the same raw data against a single small genome, e.g., PG, takes only 2–3 min.
Representative mass spectrometric data derived from P. 
gingivalis proteins used for the OWL and PG genome database searches. 
(a) Reconstructed ion chromatogram from the microcapillary HPLC 
electrospray ionization auto-CID analysis of the tryptic peptides from a 
protein band excised from SDS-PAGE. (b) The mass chromatogram trace of 
m/z 438, [M + 3H]3+ parent ion from the peptide 
HNRGFLTPELK (see Table 1 and text). (c) 
CID mass spectrum of daughter ions from the parent ion at m/z 438, 
with labels showing the most informative fragments. The ion 
m/z 110 is most likely an immonium ion diagnostic for the 
presence of histidine. Ions marked with an asterisk have lost either 
ammonia or water from the mass of the indicated y series ion. The 
nomenclature for peptide CID fragment ions has been reviewed by 
Biemann.18
Fig. 3 Representative mass spectrometric data derived from P. gingivalis proteins used for the OWL and PG genome database searches. (a) Reconstructed ion chromatogram from the microcapillary HPLC electrospray ionization auto-CID analysis of the tryptic peptides from a protein band excised from SDS-PAGE. (b) The mass chromatogram trace of m/z 438, [M + 3H]3+ parent ion from the peptide HNRGFLTPELK (see Table 1 and text). (c) CID mass spectrum of daughter ions from the parent ion at m/z 438, with labels showing the most informative fragments. The ion m/z 110 is most likely an immonium ion diagnostic for the presence of histidine. Ions marked with an asterisk have lost either ammonia or water from the mass of the indicated y series ion. The nomenclature for peptide CID fragment ions has been reviewed by Biemann.18

PG genome search results

The searches of all three gel bands against the raw PG genome database led to roughly 20 high quality matches. Mathematical details of how SEQUEST calculates these matches and criteria for determining exactly what constitutes ‘high scores’ have been published.5,6 Briefly, the program matches theoretical peptide mass spectra based on the sequences found in the database against the observed peptide CIDs, with adjustable parameters for the resolution and mass accuracy of the mass spectral data, the type of database (DNA or protein), the proteolytic cleavage specifity of the enzyme used to digest the protein and mass increments for possible post-translational modifications, among others. Each experimental CID is given a preliminary score (Sp) and a cross correlation score (Xcorr) based on the quality of the match with the theoretical spectra derived from the FASTA format database. The quality of the match is indicated by rank order and values for both Sp and Xcorr. Higher numerical values are better for both Sp and Xcorr. The actual numbers observed and their interpretation depend to some extent on the characteristics of the database being searched. By way of example, for a search of the full OWL protein database an Xcorr value >2.0 is usually significant. We examine the output for high scoring entries for a given peptide that are significantly higher than the next lowest scoring match, as indicated by the dCn (deltaCn, or 1 − Cn; Cn = normalized Xcorr) parameter in the output. When the dCn value for the top ranked match in terms of Xcorr or Cn is significantly higher by >0.1 units relative to the second highest ranking value, this suggests that the cross-correlation algorithm converged on a unique sequence in the database.5 Readers are urged to study carefully refs. 5 and 6 and the references contained therein in order really to understand the strengths and limitations of the algorithm with respect to genome searches of the type reported here. For the data set summarized in Table 1, the best 20 matches were abstracted from an output consisting of about 540 matches of lesser quality. Of those 20, roughly half were rejected because of failure to give good matches also with the ORF database. The 11 peptides judged to give good matches in both databases are summarized in Table 1. Coverage was poor owing to the high level of background from human proteins, which we estimate were more abundant in the gel bands by a factor of >1000 on a weight basis. In general, the more peptides matched with a given putative ORF, the more reliable is the assignment. However, even one peptide from a noisy low quality data set can serve to identify an ORF correctly. For the protein secretion data in Table 1, we were able to identify several PG genes for further study. The secretion data represented a ‘worst case scenario’ for our laboratory in terms of S/N and high background, which was why it was chosen to test the robustness of the method.

Manual inspection of all high scoring matches of any length, found only in the six reading frame ‘raw’ database (three forward, three backwards, or one for each letter of the triplet genetic code taken in both directions), but not in the ORF database, indicated that these hits were artifacts. They did not correspond to real expressed protein or real reading frames, as verified by genetic analysis of the surrounding DNA 1000 base pairs on either side of the codons assigned to the putative peptide sequence. In order to use the experimentally derived peptide low energy CID data16 to its greatest advantage, as a way to probe microbial genomes, a practical way had to be found to reduce the volume and improve the quality of the output from our search program. False positives were expected, in that the search algorithm is designed to achieve the best possible match of the CID spectra with the database,5i.e., there exists a presumption that the database being searched contains accurate sequence information for the CID to match. Searching a small database made up primarily of biologically irrelevant codon assignments, and where each contig is treated computationally essentially as a single large protein, is prone to artifact. Although the actual situation is more complex, to illustrate our logic it can be assumed as an approximation that only about 17% of the six reading frame translation from DNA triplets to single amino acids will be correctly in-frame, if one makes the simplifying assumption that all the DNA is transcribed and translated into protein. In the case of prokaryotic genomes, most of the DNA does in fact code for polypeptide. Therefore, the search program is being used to optimize the best fit possible of a CID spectrum against a database consisting mostly of codon assignments that are out-of-frame. Or, alternatively, if non-coding DNA is present, the potential exists for codon assignments due to the purely theoretical translation of a stretch of DNA that is not translated as protein in nature, e.g., regulatory sequences or other intergenic sequences. The consequence is that much of the voluminous output from the six frame database search was not useful, and a way had to be found to filter out efficiently much of the output, ideally leaving only the matches within a bona fide, biologically relevant ORF. This problem is inherent in the use of any protein database searching software in the context of raw genomic data and is not unique to SEQUEST.

The conventional wisdom in the genomics community seems to be that only the ORF database search should be necessary for the purposes described in this paper.17 However, we felt that such a comparison should be done at least once for P. gingivalis as an important part of validating the method. Based on prior experience with ORFIND in the context of its intended application, as a tool for purely genetic applications, we believe that the risk of false negatives using the ORF database is not great. However, the previous statement assumes that in fact a reasonable quality CID spectrum with at least one higher S/N y or b ion series18 can be acquired, which is not always the case. No useful information (see Table 1) was found in the much larger output from the six reading frame search that was not present in the ORF search. The ORFIND database was fairly ‘liberal’ in that it was biased towards listing all plausible ORFs. This bias was preferred for our application, which relies on having experimental data from direct measurements of expressed protein. Hence the ORFIND program seems to fulfil our need for a way to access unfinished or unannotated genome data, for purposes of locating genes for proteins observed experimentally using tandem mass spectrometry. Another way of looking at the putative ORF database is as a substitute for an annotated database corresponding to a single organism version of a fully annotated general protein database. This is very similar in concept to an established method of shortening the time required for computations by abstracting single species protein databases from much larger general protein databases.5 Even with the ORF database, in borderline cases with noisy data and in the presence of suspected DNA sequencing errors, it is still necessary to interpret data manually. Such manual interpretation presumes at least an empirical knowledge of the underlying data structure,16,19,20i.e., peptide fragmentation behavior under low energy CID conditions.

To complete the circle from genome to experimentally observed protein back to the genome, we employed TBLASTN,21 running on our local host (see Fig. 1 and 2). This last step was necessary to utilize fully the partial amino acid sequences that were found in both the raw and ORF databases as probes of the protein’s origin in the PG genome. The observed amino acid sequence, after giving a match in both databases, was converted back to DNA codons internally by the TBLASTN program and the DNA sequence matched with a specific location in the PG genome, as shown in Table 1. This step provided both a location in the original genome database and a confirmation of the reading frame identified earlier in the process by ORFIND. Once the location, reading frame and the SEQUEST generated peptide sequence were validated by comparison with the in-frame partial DNA sequence, homology searches of the genes and (or) the putative protein products were carried out over the internet. In the absence of a suitable fully annotated ‘positive control genome’, the validation of the method at present lies ultimately with its ability to identify genes that express protein under a given set of experimental conditions. The peptides shown in Table 1 served as the key to locate real ORFs relevant to the interaction of PG and gingival epithelial cells. Two examples taken from Table 1 are briefly summarized below.

Peptide HNRGFLTPELK (see Fig. 3).. A BLAST search demonstrated homology to polysaccharide and lipopolysaccharide biosynthesis proteins. This suggests that P. gingivalis may alter either the composition or amount of surface carbohydrates when in an epithelial cell environment. Modification of surface carbohydrates may be required to facilitate interaction of cell surface proteins with cognate receptors on the host cell. Previous studies have shown that polysaccharide capsular material can interfere with attachment to, and invasion of, epithelial cells.22 Furthermore, digestion of P. gingivalis with amylglucosidase, that partially degrades capsule, increases the invasion efficiency for endothelial cells.23 Modification of LPS could affect recognition of the organism by epithelial cells.24
Peptide DLLFK.. A BLAST search with the translated ORF containing this short sequence revealed homology to phosphoserine phosphatase enzymes. This molecule thus has the potential to subvert host cell signaling pathways, many of which are dependent on a tightly controlled series of phosphorylation and dephosphorylation events. Similarly, Yersinia spp. and Salmonella typhimurium produce phosphatases that are delivered into the host cell by the type III system and mediate cytoskeletal rearrangements.25 An additional advantage to our procedure is that short sequences of four or five amino acid residues can be used to locate an ORF quickly, as described in the Experimental section, even though they often cannot be used effectively by themselves with the BLAST programs.

Conclusions

We expect the incorporation of whole genome data into our own protein methods for PG and other bacterial pathogens to evolve in the direction of (1) eliminating the six reading frame search entirely and relying on the putative ORF database alone, and eventually the fully annotated genomes as they develop, (2) streamlining the search for host cell proteins by using smaller genomic databases, e.g., full length cDNA libraries, that avoid the problems associated with non-coding DNA in the genomes of higher organisms, and (3) incorporating parallelism either at the hard disk access or processor (CPU) level26 to speed the searches, particularly for human proteins and those from other higher organisms serving as sources for model target cells.

Acknowledgements

The authors thank Dr A. Kaas for PERL scripts that translate DNA to protein and Dr D. O. V. Alonso for assistance with PERL and his comments. Dr Maynard Olson and Mr Jimmy Eng also provided helpful comments and criticism. The PG database was provided through a pre-publication license from TIGR. We thank Kerry Nugent and Michrom BioResources for the HPLC packing materials. This work was funded under NIH grants NIDCR DE11111 and DE13061.

References

  1. M. P. Washburn and J. R. Yates, III, Curr. Opin. Microbiol., 2000, 3, 292 CrossRef CAS.
  2. J. R. Yates, III, Trends Genet., 2000, 16, 5 CrossRef.
  3. A. J. Bleasby, D. Akrigg and T. K. Attwood, Nucleic Acids Res., 1994, 22, 3574 CAS.
  4. A. Bairoch and R. Apweiler, Nucleic Acids Res., 1997, 25, 31 CrossRef CAS.
  5. J. K. Eng, A. L. McCormack and J. R. Yates, III, J. Am. Soc. Mass. Spectrom., 1994, 5, 976 CrossRef.
  6. J. R. Yates, III, J. K. Eng, A. L. McCormack and D. Schietz, Anal. Chem., 1995, 67, 1426 CrossRef CAS.
  7. C. M. Belton, K. T. Izutsu, P. C. Goodwin, Y. Park and R. J. Lamont, Cell. Microbiol., 1999, 1, 215 CrossRef CAS.
  8. C. J. Hueck, Microbiol. Mol. Biol. Rev., 1998, 62, 379 Search PubMed.
  9. Y. Park and R. J. Lamont, Infect. Immun., 1998, 66, 4777 CAS.
  10. A. Shevchenko, M. Wilm, O. Vorm and M. Mann, Anal. Chem., 1996, 68, 850 CrossRef CAS.
  11. A. Ducret, I. V. Oostveen, J. K. Eng, J. R. Yates, III and R. Aebersold, Protein Sci., 1998, 7, 706 CAS.
  12. H. Wang, K. B. Lim, R. F. Lawrence, W. N. Howald, J. A. Taylor, L. H. Ericsson, K. A. Walsh and M. Hackett, Anal. Biochem., 1997, 250, 162 CrossRef CAS.
  13. H. Wang and M. Hackett, Anal. Chem., 1998, 70, 205 CrossRef CAS.
  14. R. T. Kennedy and J. W. Jorgenson, Anal. Chem., 1989, 61, 1128 CrossRef CAS.
  15. M. A. Moseley, L. J. Deterding, K. B. Tomer and J. W. Jorgenson, Anal. Chem., 1991, 63, 1467 CrossRef CAS.
  16. D. F. Hunt, J. R. Yates, III, J. Shabanowitz, S. Winston and C. R. Hauer, Proc. Natl. Acad. Sci. USA, 1986, 83, 6233 CAS.
  17. A. Kass, personal communication..
  18. K. Biemann, Annu. Rev. Biochem., 1992, 61, 977 CrossRef CAS.
  19. D. F. Hunt, J. E. Alexander, A. L. McCormack, P. A. Martino, H. Michel, J. Shabanowitz, N. Sherman, M. A. Moseley, J. W. Jorgenson, L. J. Deterding and K. B. Tomer, in Techniques in Protein Chemistry II, ed. J. J. Villafranca, Academic Press, New York, 1991, p. 441. Search PubMed.
  20. I. A. Papayannopoulos, Mass Spectrom. Rev., 1995, 14, 49 CAS.
  21. S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, Nucleic Acids Res., 1997, 25, 3389 CrossRef CAS.
  22. J. W. St. Geme and S. Falkow, Infect. Immun., 1991, 59, 1325.
  23. R. G. Deshpande, M. Khan and C. A. Genco, Invasion Metastasis, 1998, 18, 57 Search PubMed.
  24. R. P. Darveau, A. Tanner and R. C. Page, Periodontol. 2000, 1997, 14, 12 Search PubMed.
  25. I. DeVinney, I. Steele-Mortimer and B. B. Finlay, Trends Microbiol., 2000, 8, 29 CrossRef CAS.
  26. D. Tabb, J. Eng and J. R. Yates, III, in Proteome Research: Mass Spectrometry, ed. P. James, Springer, Berlin, 2000, p. 125. Search PubMed.

This journal is © The Royal Society of Chemistry 2001
Click here to see how this site uses Cookies. View our privacy policy here.