OR WAIT null SECS
September 2006. Top-down and bottom-up are alternative strategies for protein identification and characterization by mass spectrometry. How do they fit into the world of proteomics? What are their implications for separation technology? These questions are addressed in this installment of "Directions in Discovery."
Proteomics studies play a central role in the discovery of disease biomarkers and drug targets. Mass spectrometry (MS) used in concert with a wide variety of separation methods is the principal methodology for proteomics. Two fundamental strategies for protein identification and characterization by mass spectrometry currently are employed in proteomics. In bottom-up approaches, purified proteins, or complex protein mixtures, are subjected to proteolytic cleavage, and the peptide products are analyzed by MS. In top-down proteomics, intact protein ions or large protein fragments are subjected to gas-phase fragmentation for MS analysis. This issue of "Directions in Discovery" will explore the advantages and limitations of the two strategies, and compare the separation technology requirements of each.
A Matter of Definitions
There exists some ambiguity about the usage of the terms "top-down" and "bottom-up" in proteomics. In one lexicon, they designate the entity that is subjected to the primary separation technique. In this regard, separation of proteins by gel electrophoresis (GE) followed by in-gel proteolytic digestion and liquid chromatography (LC)–MS analysis of the peptides can be considered a top-down approach, because the initial entities separated are proteins. In another lexicon, top-down and bottom-up refer to the entities introduced into the mass spectrometer. This is the terminology used by Reid and McLuckey (1) in their review of top-down protein characterization, and will be used here (Figure 1).
In bottom-up proteomics, the analytes introduced into the mass spectrometer are peptides generated by enzymatic cleavage of one or many proteins. The proteins can first be separated by GE or chromatography, in which case the sample will contain only one or a few proteins. Alternatively, a complex protein mixture initially can be digested to the peptide level, then separated by on-line chromatography coupled to electrospray mass spectrometry (ESI–MS). In the latter case, the digest can contain thousands to hundreds of thousands of peptides, and require separation in two or more chromatographic dimensions before MS analysis. The identity of the original protein is determined by comparison of the peptide mass spectra with theoretical peptide masses calculated from a proteomic or genomic database. There are two approaches for protein identification using the bottom-up approach, peptide mass fingerprinting and tandem MS (MS–MS).
Peptide mass fingerprinting: In peptide mass fingerprinting, peptide masses obtained from an MS scan are compared to calculated peptide masses generated by "in silico" cleavage of protein or gene sequences in the database using the same specificity as the enzyme that was employed in the experiment. One disadvantage of peptide mass fingerprinting is the requirement for pure proteins or simple mixtures of proteins. The purification steps therefore limit the throughput of the peptide mass fingerprinting approach. Another disadvantage is the requirement for several peptides to uniquely identify a protein. Peptide mass fingerprinting can be performed with the same instrumentation used for MS-MS, but also can be done using time-of-flight (TOF) mass spectrometers with matrix-assisted laser desorption (MALDI) ionization.
Tandem MS: In MS-MS, a peptide ion is isolated in the mass analyzer and subjected to dissociation to produce product ion fragments. The amino acid sequence of the original precursor ion can be deduced from the masses of the fragment ions; this forms the basis for de novo sequencing by MS-MS (2). This is a laborious and time-consuming process not amenable to high-throughput analysis. Alternatively, fragmentation data can be used to determine a short stretch of amino acid sequence (a "sequence tag"), which can be used to search a database (3). A more convenient approach is the uninterpreted or "shotgun" proteomics method pioneered by Yates (4), in which product ion spectra are compared with databases by cross-correlation analysis to identify the intact protein.
Instrumentation for MS-MS: A variety of MS systems currently are used for bottom-up MS-MS proteomics, including 3D and linear ion traps, hybrid quadrupole-TOF, and TOF-TOF mass spectrometers. The modest resolving power of ion trap instruments is offset by their ability to generate high-quality MS2 and MSn spectra. TOF-TOF instruments typically employ MALDI ionization; peptides resolved by high performance liquid chromatography (HPLC) can be spotted onto MALDI target plates with the advantage of rapid analysis and archiving of target plates for reanalysis. All of these systems generate fragments by collision-induced dissociation (CID). In CID, the peptide bond is preferentially cleaved C-terminal to the carbonyl group to yield b and y ions, which dominate the MS–MS spectrum.
The development of high-resolution mass spectrometers based upon Fourier-transform ion cyclotron resonance (FT-ICR) permits highly accurate peptide mass determinations. Smith and colleagues (5) have used the high mass accuracy of FT-ICR instruments to develop a version of bottom-up protein identification termed the accurate mass tag (AMT) approach. In this approach, the combination of accurate peptide mass measurement and chromatographic retention time can be used to identify a protein uniquely. Development of an AMT is a two-step process. Potential mass tags (PMT) are identified by conventional shotgun LC–MS-MS of a peptide digest, then the PMT tag is confirmed by LC–FT-ICR–MS to form the AMT tag. The AMT tag can be used to identify a protein without further MS–MS analysis.
Advantages and limitations of bottom-up strategies: Bottom-up proteomics is the most mature and most widely used approach for protein identification and characterization. Reversed-phase HPLC provides high-resolution separations of peptide digests with solvents that are compatible with ESI. On-line nano-scale reversed-phase LC–ESI–MS–MS can be fully automated and is almost universally used for bottom-up proteomics. Commercial instruments with control software and bioinformatics tools optimized for bottom-up applications are available from several vendors. The bottom-up strategy using on-line multidimensional capillary HPLC–MS-MS has been most successful in the identification of proteins in digests derived from very complex mixtures such as cell lysates (6). Moreover, quantitative techniques have been developed using affinity tags and stable isotope labels for determination of up- and down-regulated proteins in expression proteomics (7).
There are several fundamental and practical limitations to the bottom-up strategy.
Most importantly, only a fraction of the total peptide population of a given protein is identified. Therefore, information on only a portion of the protein sequence is obtained. It is clear from genomic studies that each open reading frame can give rise to many protein isoforms, which can originate from alternative splicing products and varying types and locations of posttranslational modifications (PTMs). PTMs such as phosphorylation and glycosylation are known to be important in the regulation of protein function and cell metabolism. A consequence of the limited sequence coverage in bottom-up proteomics is loss of much information about PTMs. Moreover, PTMs are often labile in the CID process and require techniques such as neutral loss scanning to detect them.
Practical limitations are encountered when bottom-up methods are used for protein identification from very complex peptide mixtures. On-line multidimensional LC–MS-MS analyses using ion-exchange coupled to reversed-phase columns require extended run times of as long as 15 h or more. Although this can be automated, the throughput of multidimensional LC–MS-MS is quite limited. Other problems include the loss of information about low-abundance peptides in mass spectra dominated by high-abundance species. Finally, narrow chromatographic peak widths can compromise acquisition of adequate MS–MS information during elution.
In top-down proteomics, intact protein molecular ions generated by ESI are introduced into the mass analyzer and are subjected to gas-phase fragmentation. An obstacle to this approach is the determination of product ion masses from multiply charged product ions (1). These can vary in charge state up to that of the multiply charged protein precursor ion. This can introduce ambiguity in the interpretation of top-down MS-MS spectra. Two approaches have been used to circumvent this limitation. The first is charge state manipulation through gas phase ion–ion interactions, and the second is the use of instruments with high mass measurement accuracy (MMA).
Ion charge state manipulation: Reduction of product ion charge states can be accomplished by introduction of gas-phase anions to strip protons from the product ion through ion–ion proton transfer (8). Similarly, the precursor ion charge state can be manipulated by ion stripping to "charge-state purify" the precursor before dissociation. These techniques permit top-down MS-MS of instruments of modest MMA such as ion traps.
High MMA mass spectrometers: The mass accuracy of FT-ICR mass spectrometers with high magnetic fields has made them the instruments of choice for top-down proteomics. Commercial FT-ICR instruments exhibit mass resolution values of 500,000 or greater and mass accuracy of 2 ppm or less. With these instruments, it is possible to determine the isotope spacing in multiply charged product ions and determine product ion charge state. Recently, hybrid ion trap-FT-ICR instruments have been introduced which are being used for top-down applications. A limitation of the hybrid ion trap FT system is limited sensitivity and the distance between the ion trap and the ICR cell (which introduces time-of-flight effects). Another limitation of FT and hybrid FT systems is the high cost of purchase, maintenance, and operation. Last year, Thermo Electron introduced a hybrid linear ion trap–orbitrap mass spectrometer. The orbitrap is a novel high-resolution mass analyzer in which the axial motion of ions between two concentric electrodes generates an image current which is Fourier-transformed to yield a mass spectrum. This instrument provides mass accuracies of <2 ppm and resolution of 100,000. Its suitability for top-down protein sequencing recently has been reported (9).
Dissociation techniques in top-down proteomics: CID is the most commonly used method in bottom-up proteomics. However, the efficiency and relative fragment abundance are dependent upon the peptide sequence. Electron-capture dissociation (ECD), developed in McLafferty's lab (10), is the favored technique for MS–MS of proteins in top-down applications and is implemented primarily on FT-ICR instruments. In this technique, low-energy electrons (<1 eV) are captured by multiply charged [M + nH]n + protein precursor ions to produce [M + nH] (n-1)+– ions. These ions fragment rapidly by cleavage of the N-αC bond to yield primarily c and z type ions (Figure 2). Fragmentation occurs more randomly along the peptide backbone compared with CID. Thus, ECD yields sequence information that is complementary to CID and has the potential for providing high sequence coverage. A major advantage of ECD is the preservation of PTMs in the fragmentation process, allowing site-specific PTM analysis (Figure 3).
Electron-transfer dissociation (ETD) is a technique similar to ECD that employs gas-phase ion–ion chemistry (11). Singly-charged anions transfer an electron to multiply charged protein precursor ions, inducing fragmentation of the protein backbone along pathways that are analogous to those of ECD. Both ECD and ETD are available on commercial MS instruments.
Adantages and limitations of top-down strategies: The two major advantages of the top-down strategy are the potential access to the complete protein sequence and the ability to locate and characterize PTMs. In addition, the time-consuming protein digestion required for bottom-up methods is eliminated.
Top-down proteomics is a relatively young field compared to bottom-up proteomics, and currently suffers from several limitations. First, the very complex spectra generated by multiply charged proteins limits the approach to isolated proteins, or simple protein mixtures at best. Second, the favored instrumentation (FT-ICR, hybrid ion trap FT-ICR or hybrid ion trap–orbitrap) are expensive to purchase and operate. Third, the top-down approach does not work well with intact proteins larger than about 50 kDa. Fourth, the favored dissociation techniques (ECT, ETD) are low-efficiency processes requiring long ion accumulation, activation, and detection times. This limits the ability to couple top-down MS techniques with on-line separations. Fifth, the mechanisms of protein dissociation behavior are less well understood than those of peptide dissociation. If top-down approaches are to be adopted widely, a greater understanding of fragmentation of multiply charged ions is needed (1), including the influence of precursor ion charge state, the role of protein primary, secondary and tertiary structure, and the contribution of PTMs. Finally, bioinformatics tools for top-down proteomics are primitive compared to those for bottom-up proteomics.
Separation Technology for Top-Down and Bottom-Up Proteomics
Separation technology for bottom-up proteomics has been reviewed previously in this column (12–14). Bottom-uppers tend to fall into two camps. One camp favors one dimensional (1-D) or two-dimensional (2-D) gel electrophoresis with in-gel digestion of separated spots or bands, followed by peptide extraction from the gel and capillary reversed-phase LC–MS-MS. Advantages include added information on protein mass and (in the case of 2-D GE) pI for protein identification. Also, some PTMs can be resolved by 2-D gel electrophoresis. For example, phosphorylated proteins often appear on 2D gels as horizontal "trains" of proteins of the same molecular mass but different phosphorylation state. Drawbacks include time-consuming and labor-intensive manipulation of gels, poor recovery of large or hydrophobic proteins, dominance of high-abundance proteins in the gel presentation, and loss of proteins during the separation and digestion process. The other camp favors multidimensional LC over GE. The advantages include the ability to automate the entire process, and the reduced bias against low-abundance proteins. Disadvantages include the lack of protein pI or mass information from the separation process, reduced throughput due to extended analysis times, and, in some approaches, the requirement for complicated and cumbersome valving or column-switching hardware. But all bottom-up approaches using ESI terminate in a reversed-phase LC separation to introduce peptides into the mass spectrometer. This fact has driven the development capillary and nanoLC technology over the last decade. This includes the design of LC pumping systems capable of delivering gradients at nanoliter-per-min flow rates, high-efficiency packed capillary columns, and packing materials that can be operated with MS-friendly solvent systems with good peak shapes. A recent development, pioneered by Waters Associates, is the introduction of sub-2-μm reversed-phase column packings that can provide faster, more efficient separations. These materials require operation at pressures in excess of 10,000 psi, and several manufacturers have introduced nanoLC systems capable of operation at these elevated pressures. Bottom-up analysis of complex protein mixtures found in cell lysates or tissue extracts can still be a daunting task even for multidimensional LC. A continuing area of growth is prefractionation techniques to simplify the separation task, either at the protein or at the peptide level. This includes affinity capture techniques (14) to select proteins or peptides with particular amino acid residues or PTMs. Also, solution-phase micropreparative isoelectric focusing is increasingly used for preliminary fractionation of proteins based upon pI.
Because top-down proteomics uses direct MS analysis of the intact protein, front-end separation technologies tailored for proteins are employed. Because of the problems coupling on-line chromatography to FT-ICR instruments with ECD, much of top-down work is done with infusion of isolated proteins or simple protein mixtures. Therefore, protein fractionation and purification is accomplished off-line, using the same techniques discussed previously for bottom-up proteomics. GE methods can be problematical because of difficulties in extracting proteins from the gel matrix (15). Detergents can interfere in top-down applications, and the range of MS-compatible protein solubilizing reagents is limited. Ion exchange, size exclusion, and hydrophilic- or hydrophobic-interaction chromatography are all viable candidates for protein fractionation. Reversed-phase HPLC has been used (16), but separation of a single protein into multiple conformers complicates the chromatography. Solution-phase micropreparative isoelectric focusing also has been used for protein prefractionation in a top-down approach (17). Capillary electrophoresis often is used for protein separations and can show promise in top-down studies. In fact, Smith's group successfully coupled capillary isoelectric focusing with FT-ICR MS using an on-line microdialysis device; this was applied to separation of proteins in a bacterial cell lysate (18).
Given the complementary nature of the information provided by top-down and bottom-up strategies, both will continue to be employed in proteomics. The top-down approach is newer to the proteomic world, and will benefit from improvements in MS hardware (for example, extended mass range), advances in the understanding of gas-phase behavior of proteins, and improved bioinformatics tools (1). A melding of the two strategies is already in progress with the emergence of "middle-down" proteomics (17). In this approach, large proteins are subjected to limited proteolysis by enzymes such as LysC to yield products in the 5–20 kDa range. These more manageable polypeptides are then sequenced using top-down methodologies with the advantages of high sequence coverage and retention of PTM information.
"Directions in Discovery" editor Tim Wehr is staff scientist at Bio-Rad Laboratories, Hercules, California. Direct correspondence about this column to Direct correspondence about this column to "Directions in Discovery," LCGC, Woodbridge Corporate Plaza, 485 Route 1 South, Building F, First Floor, Iselin, NJ 08830, e-mail email@example.com
(1) G.E. Reid and Scott A. McLuckey, J. Mass Spectrom. 37, 663–675 (2002).
(2) D.F. Hunt, J.R. Yates, J. Shabanowitz, S. Winston, and C.H. Hauer, Proc. Nat. Acad. Sci. USA 83, 6233–6237 (1986).
(3) M. Mann and M. Wilm, Anal. Chem. 66, 4390–4399 (1994).
(4) J.K. Eng, A.L. McCormack, and J.R. Yates, J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
(5) B. Bogdanov and R.D. Smith, Mass Spectrom. Rev. 24, 168–200 (2005).
(6) A.J. Link, J. Eng, D.M. Schieltz, E. Carmack, G.J. Mize, D.R. Morris, B.M. Garvik, and J.R. Yates, Nat. Biotechnol. 17, 676–682 (1999).
(7) S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb, and R. Aebersold, Nature Biotechnol. 17, 994–999 (1999).
(8) J.L. Stephenson and S.A. McLuckey, Anal. Chem. 68, 4026–4032 (1996).
(9) B. Macek, L.F. Waanders, J.V. Olsen, and M. Mann, Molecular and Cellular Proteomics 5, 949–958 (2006).
(10) R.A. Zubarev, N.L. Kelleher, and F.W. McClafferty, J. Am. Chem. Soc. 13, 3265–3266 (1998).
(11) J.J. Coon, B. Uebewrheide, J.E. Syka, D.D. Dryhurst, J. Ausio, J. Shabanowitz, and D.F. Hunt, Proc. Nat. Acad. Sci. USA 102, 672– 678 (2005).
(12) T. Wehr, LCGC 19, 702–711 (2001).
(13) T. Wehr, LCGC 20, 954–962 (2002).
(14) T. Wehr, LCGC 21, 274–284 (2003).
(15) N.L. Kelleher, Anal. Chem. 76, 197A–203A (2004).
(16) Y. Wang, B.M. Balgley, P.A. Rudnick, and C.S. Lee, J. Chromatogr. A 1073, 35–41 (2005).
(17) P.M Thomas, C.D. Wenger, B.A. Parks, D.E. Robinson, R.T.Fellers, Y. Kim, L. Zamborg, R.D. LeDuc, A.J. Forbes, and N.L. Kelleher, Poster MP 476, 54th ASMS Conf. Mass Spectrometry and Allied Topics, Seattle, WA, 2006.
(18) P.K. Jensen, L. Pasa-Tolic, G.A. Anderson, J.A. Horner, M.S. Lipton, J.E. Bruce, and R.D. Smith, Anal. Chem. 71, 2076–2084 (1999).