Glycoproteins and Biotechnology — Recombinant Biopharmaceutical Products: Identification, Characterization, and Chemical Equivalency?

December 1, 2010
Anurag S. Rathore

Ira S. Krull

LCGC North America

LCGC North America, LCGC North America-12-01-2010, Volume 28, Issue 12
Page Number: 1028–1036

Columns | <b>Column: Focus on Biopharmaceutical Analysis</b>

Glycoproteins appear to have become the most common biopharmaceutical product today, and they also seem to be increasing in popularity and importance with time.

What are glycoproteins and why have they become so very important in the biotech industry today? Why are so many biotech firms emphasizing glycoproteins for their product pipeline, especially antibodies?

Ira S. Krull

Glycoproteins represent but one class of protein variants, alongside other classes, such as phosphoproteins, membrane proteins, or hydrophobic proteins, but they have become one of the most intensely studied and commercialized class, as real or potential biotherapeutics today (1–6,59,60). Perhaps one of the best discussions of the chemistry and biochemistry of glycoproteins is that by Walsh (7). Figure 1 illustrates, in schematic fashion, a typical antibody, which is just one type of glycoprotein, perhaps the most commercially successful and viable biopharmaceutical products now on the market. Though this particular glycoprotein has its glycans (carbohydrate chains, circles in Figure 1) on only one amino acid in the antibody heavy chain, at two sites (because there are two heavy chains in these antibodies), most glycoproteins have several sites of attachment and several different glycans. Even monoclonal antibodies have several different variants or isoforms (different isoelectric points or pIs), because of the expression of a set of glycoforms on the heavy chains. Glycoproteins with more than one site of glycosylation reflect heterogeneity from each site. Or, because of changes in certain amino acids (for example, glutamic acid to pyroglutamate, deletion of C-terminal lysine or asparagines deamidations, and others). And, of course, as the number of glycans (oligosaccharides, often with several antennae or side chains) incorporated is increased in other glycoproteins, sometimes dependent on the cell expression system, the number of possible variants or glycoforms (that is, isoforms or variants or posttranslational modifications) also increases (7–12).

Anurag S. Rathore

Glycoproteins appear to be uniquely important in cell functioning, stability in vivo, cellular distribution in vivo, membrane transport, and even cellular recognition or differentiation (62–64). Glycoproteins are often important integral membrane proteins, where they play a role in cell-cell interactions (62). They are also important for white blood cell recognition, especially in mammals. Classes of glycoproteins include: collagens, mucins, transferrin, ceruloplasmin, immunoglobins, histocompatability antigens, human chorionic gonadotropin (HCG), thyroid-stimulating hormone (TSH), alkaline phosphatase, certain plasma proteins of coldwater fish, various proteins involved in hormone and drug action, calnexin, calreticulin, notch, and specific glycoproteins on the surface membranes of platelets (62).

Figure 1: A typical antibody structure, indicating arrangements of heavy chain (HC), light chain (LC), and location of glycosylations at one specific arginine amino acid on the HCs (PhD Thesis, Ling Chen Santora, Northeastern University, 2001).

A listing of some FDA-approved monoclonal antibody biopharmaceuticals can be readily found on their website (9,10). These include Rituxan, Remicade, Zenapak, Herceptin, Zevalin, Humira, and others. Quite aside, there is an equally large number of nonantibody, glycoprotein biopharmaceuticals on the markets today, including Enbrel, TNKase, tPA, Aranesp, and Xigris (9). Perhaps one of the most popular and useful texts that deals just with proteins in general, as well as glycoproteins, is that by Creighton (11). A very useful overview or synopsis of the major analytical approaches or techniques in glycobiology is that edited by Townsend and Hotchkiss (12). A similar overview was edited by Jackson and Gallagher (13). It is very likely that more publications in analytical biotechnology have appeared and will continue to appear dealing with glycoproteins than any other class of biopharmaceuticals (14–24).

Glycans is a commonly used term that refers to carbohydrate or oligosaccharide chains, often having multiple chains or antennae (for example, biantennary, triantennary, and others) (54–56).

Some very recent texts are dealing specifically with glycoanalysis, and virtually all of the currently available analytical methods (54,55). Glycobiology refers to the biology associated with these glycans on glycoproteins, separate polysaccharides alone, quite apart from protein attachments. The analysis of glycans attached to glycoproteins, or for that matter of free-standing polysaccharides or monosaccharides, is an enormous field, which is today an expected assay to be performed on virtually all glycoprotein biopharmaceuticals destined for marketing (56). The United States Pharmacopeia (USP) has recently put forth for review and public comments a proposed, new, general information chapter that discusses glycan analysis methods and analytical strategies (56). It also discusses, at some length, approaches for the direct analysis of glycoproteins and the analysis of released, derivatized, or nonderivatized glycans.

Glycoproteins have certain, perhaps novel, attributes that have made them the preferred biopharmaceutical products for numerous applications and indications (7–10). They are readily expressed in both mammalian and vegetable cell expression systems, often in high yields, against numerous targets or applications (9). They have long shelf-lives in the dry state, show good stability profiles, generally are not immunogenic in the monomeric state, and are cleared from the body readily after delivering an effective performance. They are also demonstrating good membrane transport, have higher than general water–biofluid solubility, are not rejected by the body when humanized, and can be manufactured with high reproducibility, lot-to-lot and batch-to-batch (7,8). They are also often produced in high yields, are readily purified using a number of process techniques, and as antibodies, can be designed to target very specific disease sites. Though, at times, they have been manufactured with human, viral impurities, expression thru vegetable systems appears to overcome these problems at times (25,26). It has been estimated that, today, over 50% of all biopharmaceuticals now going through the various pipelines in the world's biotech companies are glycoproteins in nature. This would also include fusion proteins, often using the Fc region of a humanized antibody to improve delivery and stability of the partner protein. The Fab (antigen binding) region of an antibody targeted against a specific site can also be fused with another protein that delivers the true therapeutic activity. And, one might reasonably expect that glycoproteins may well increase in usage and commercialization with time, as they have already over the past 10 years (27–29).

What then are the problems remaining for even greater utilization of glycoproteins as biopharmaceuticals in the future? Surely, one of these problems relates to our current ability to fully or well characterize glycoprotein variants, which often comprise a dozen or more species in a final drug substance. This is not a trivial issue, especially when we now begin to consider marketing biogenerics, biosimilars, or biologic follow-ons, as more and more glycoprotein, proprietary products come off patent (61). With most humanized antibodies, there are usually two sites of glycosylation, N-linked, as above, on the two heavy chains. And, depending on the cell expression system, these may have a limited number of glycans attached, still leading to a dozen or more final variants (also known as glycoforms). However, other glycoproteins, depending on how they are expressed, may have dozens of variants present, all of which may well form the drug substance.

These are not always easily resolved by even two-dimensional chromatographic or electrophoretic methods (for example, multidimensional liquid chromatography [MDLC], two-dimensional gel electrophoresis [2DGE], and differential gel electrophoresis [DIGE]). And, in the case of glycoproteins (not antibodies), each of these variants may have both N- or O-linked glycans, with several amino acids bearing such attachments. As the number of glycans attached to a single variant increases, our ability to fully characterize each variant decreases, perhaps exponentially. It is difficult enough to fully characterize a single variant, once isolated and pure or homogeneous, but if these are not baseline resolved before mass spectrometry (MS) detection in LC or capillary electrophoresis (CE) modes, their characterization again becomes problematic. And, using bottom-up methods, rather than both top-down and bottom-up simultaneously, putting the right peptides to the right parent protein again becomes even more problematic. When Humpty Dumpty fell off that famous wall, all the king's men could not put Humpty Dumpty back together again. Thus, complete characterization of an individual glycoprotein may become much, much easier when we can baseline resolve it from all other variants present. And, this may only be feasible when we can finally do true, top-down protein sequencing (TDS) of even high MW glycoproteins, including all sites of glycan attachment and the nature of each glycan present at any amino acid attachment site. That is not yet here, but it is indeed truly coming (vide infra) (17,30–36).

What do we mean by true, complete characterization of a glycoprotein, as opposed to the euphemistic expression, well-characterized biopharmaceutical product (WCBP)? Complete characterization is what we just suggested, that we know the complete amino acid sequence, we know every site of glycan attachment, and we know the specific glycan sequence at each such site. And, it also means that we know the relative or absolute levels of each variant in the final drug substance, after complete characterization. WCBP does not mean this, but it means that we have practically (almost) characterized the drug substance, and that we can make it the very same way, batch-to-batch with a well-controlled process. The process then defines the product, but only because we were not able to more fully or completely characterize each and every major variant in the drug substance. Without the ability to demonstrate true, complete characterization of a biogeneric glycoprotein mixture, it seems that the regulatory agencies may well require a clinical demonstration of safety and efficacy (29). With the ability to demonstrate true, complete characterization and quantitation of a biogeneric glycoprotein mixture, the agencies most likely will not demand clinical testing, perhaps only animal testing? Thus, as we move into the world of biogenerics and follow-on biologics (FOB), there will be more and more pressure for the biogeneric firms to be able to demonstrate true, complete glycoprotein mixture characterizations, if they wish to bring their biogenerics to market ASAP.

Why Can We Not Fully Characterize Complex Mixtures of Glycoproteins Today, or Can We? What Do We Do Instead?

If we have generated a complex mixture of new, unknown glycoproteins as our drug substance, the regulatory agencies would like us to characterize, as best possible, each and every major variant present. And, they would like us to demonstrate batch-to-batch chemical equivalency for that same mixture of glycoproteins, in the same relative or absolute amounts produced, batch-to-batch. Thus, they want a demonstration of complete variant structures, as best possible, and then batch-to-batch chemical equivalency or lot release testing. This is often easier said than done, of course, especially if one is dealing with a complex mixture of glycoprotein variants and they are not resolvable by any analytical separations technique. It then becomes almost impossible to demonstrate complete protein characterization, unless one is able to do intact, top-down sequencing for each species present. Figure 2 demonstrates a typical, 2DGE plot for a mixture of yeast proteins, using isoelectric focusing in the first dimension followed by sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) in the second. This technique is often employed within contract service organizations (CSO) to "identify" variants, using bottom-up methods with database searching routines, and to provide the identify of known, putative proteins (54–56). Such databases, of course, do not usually exist for proprietary or generic biopharmaceutical glycoproteins. 2DGE and other analytical separation methods are routinely used to "demonstrate" lot-to-lot chemical equivalency after the individual, major spots have been fully or well characterized as being a part of the drug substance.

Figure 2: Silver-stained, two-dimensional gel electrophoresis pattern of 50 µg yeast extract.

True characterization of each spot would require a combination of both top-down, intact MW determination followed by peptide mapping with identification of all glycopeptides present, the amino acid location of each glycan and its sequence (as well as anomericity, cis–trans orientations of hydroxyls, and all linkages). This is not yet true TDS, it is just determination of the intact MW of the glycoprotein combined with bottom-up peptide mapping and peptide mass fingerprinting (PMF) to deduce the complete sequence, location, and nature of each glycan present. This will only work, however, if every single variant is fully resolved from the next, so that only a single variant goes into the matrix-assisted laser desorption ionization–time-of-flight mass spectromtry (MALDI-TOFMS) or high performance liquid chromatography–electrospray ionization mass spectrometry (HPLC-ESI-MS) system for top-down and bottom-up sequencing. There are numerous MS approaches described in the literature (vide infra) that have been described as being useful to fully or partially characterize glycoproteins. However, this is not yet true top-down protein sequencing, which will come later, perhaps this year or next?

If two or more glycoproteins are in a single 2DGE spot, their intact MWs can be determined by MALDI-TOFMS or ESI-MS, but it then becomes somewhat problematic to do peptide mapping on a mixture of the original proteins and put the jigsaw puzzle together in the right arrangements for each protein. It is not impossible, just very tricky. And, if a spot contains more than two glycoproteins, the puzzle becomes all the more problematic and perhaps impossible to solve. This is quite aside from the obvious problem of ion suppression, whenever multiple species of similar structure enter the MS instrument. Hence, there remains the basic, fundamental need to resolve each and every single glycoprotein variant before efforts are initiated to do characterizations. If we were able to perform true TDS in the MS system, even for unresolved parent proteins, then we could do complete characterizations of unresolved proteins. That is almost here, but we have not yet seen it fully demonstrated for a complex mixture of glycoproteins in a drug substance (vide infra). It is close, but not quite reality.

Thus, many FDA filings as CMCs (Chemistry, Manufacturing, and Controls) for NDAs (New Drug Applications) contain alternative approaches to WCBPs, without actually demonstrating true and complete glycoprotein characterization. These approaches have always been limited by the existing analytical technologies. Thus, HPLC profiles of the drug substance, showing partially resolved variants, or 2DGE of partially resolved variants, or HPCE or MDLC of partially resolved variants have been routinely used to demonstrate batch-to-batch chemical equivalency (along with other analytical data) (37,38). Figure 3 illustrates the ability to use both cation-exchange chromatography and capillary isoelectric focusing (cIEF) to partially resolve all the variants present in a precommercial batch of a humanized monoclonal antibody destined for market (37). However, full resolution and chemical characterization of each individual variant present was not, at that time, possible.

Figure 3: The ability to use both cation-exchange chromatography and capillary isoelectric focusing (cIEF) to partially resolve all the major variants present in a precommercial batch of antibody variants (37). Treatment of the original mixture of variants with carboxypeptidase B (CPB) removes C-terminal lysines and causes the three major peaks to collapse to one peak by both cation-exchange chromatography and cIEF.

However, determining the specific nature of each glycoprotein has usually not been possible, and thus glycoprofiling is routinely used to demonstrate the total glycans present on the entire mixture or individual variants. This technique, known as glycan analysis, releases N- and O-linked glycans, separates them from the residual protein mass, and then characterizes the glycans present to provide a glycoprofile (1DGE, 2DGE, high performance capillary electrophoresis [HPCE], hydrophilic interaction chromatography [HILIC], MALDI, 2-AB MALDI, and others). There are numerous analytical methods now available for performing glycoprofiling (54–57).

This adds to a partial WCBP characterization profile and provides confidence that lot-to-lot chemical equivalency can or may be demonstrated. However, it does not provide true, complete chemical characterization of each and every glycoprotein present in that drug substance. Nor does it demonstrate the absolute amounts of each variant present, but only relative percent peak areas, which also helps demonstrate chemical equivalencies for lot release purposes. In general, virtually all prior chemical characterizations of complex mixtures of glycoproteins, including antibodies, have not been able to do true, complete characterization of each and every variant present. This is changing, slowly but surely, and eventually, it seems clear we will be able to do this, even for a complex mixture of glycoproteins that does not allow for complete baseline separation of each and every variant present. Partial resolution may suffice if we can do complete sequencing via top-down methods that also provides location and nature of each and every glycan on each and every glycoprotein. This is not impossible, just slow in coming but almost here. These techniques require the ability in the MS system to perform amino acid sequencing for each peptide in the peptide map of the glycoprotein, to demonstrate the location of every glycan on a specific amino acid in every glycopeptide, and then to sequence that glycan to determine its exact chemical structure. These are, by now, somewhat routine measurements, at least for glycopeptides, if not quite yet for glycoproteins. And, if used together with top-down, intact MW determinations of the parent glycoprotein, such techniques are fully capable of providing complete chemical characterizations. Again, baseline resolving each, intact glycoprotein, either in MDLC or 2DGE, before the top-down and bottom-up MS routines above, will virtually guarantee obtaining a complete chemical characterization of every glycoprotein in a drug substance. And, without any ion suppression occurring, either.

When Can We Fully Characterize Complex Mixtures of Glycoproteins Today?

There are several ways by which complex mixtures of glycoproteins can, in principle, be fully characterized, but these will require some form of multidimensional separations (39). Separation approaches such as MDLC or 2DGE are capable of providing upward of 5000 peak capacity, and thus should be able to fully, baseline resolve most, if not all, recombinant glycoprotein drug substances (39–41). However, such high resolving analytical techniques are perhaps too complex to be used in the biotech industry for drug substance, chemical characterizations, or chemical equivalency (lot-to-lot release testing), without database searching. Both MDLC and 2DGE in proteomics usually rely on using established, databases together with PMF and peptide mapping to identify known proteins. And most biotechnology products are not in commercial databases, they are genetically engineered and not naturally occurring, in most instances. A combination of top-down and bottom-up protein sequencing, on baseline resolved glycoprotein variants, will provide the correct, complete structure of each variant, if the proper MS system is employed. Again, there are very few, if any, examples of such complex analytical techniques being routinely used in the biotech industry today. It is not that it cannot be done, it is just that it is not being routinely used, as yet (17,19,30–36,45–50).

We should hasten to add that complete identification of individual, glycopeptides coming from a glycoprotein enzymatic digest is quite common today, using a variety of MS approaches (30,33,45–50). This allows for assigning the specific location of each attached glycan, the amino acid sequence of the peptide, and the exact structure of the glycan, just using MS methods. However, TDS does not appear to be 100% successful for glycoproteins yet, but can provide N-terminal and C-terminal sequences for many proteins (vide infra). At times, such TDS methods can also identify other (internal) peptides coming from collision-induced dissociation (CID) in the MS system. However, there are no total TDS glycoprotein sequences yet demonstrated in the open literature. These TDS methods are really of several (at least three) basic designs, one using MALDI-TOFMS with in-source decay (ISD) and post-source decay (PSD) of the original glycopeptides or, in some instances, proteins (Bruker Daltonics, Billerica, Massachusetts) (35,36,42–44). Another involves a combination of CID with electron transfer dissociation (ETD) or electron capture dissociation (ECD), using ESI-MS, usually with reversed-phase LC interfacing, often described using an Orbitrap instrument (Thermo Fisher Scientific, Waltham, Massachusetts) (42–48). Yet another viable approach, has used a combination of ion mobility spectrometry (IMS) together with TOFMS, as in the Synapt instrument (Waters Corp., Milford, Massachusetts), using high and low energy dissociation of the intact glycopeptides to denote amino acid sequence, location of each glycan, and the specific glycan structure.

An approach that will work, though not yet fully demonstrated for a complex mixture of glycoproteins in a biotech drug substance, would be MDLC or 2DGE together with just TDS. This has been demonstrated for a variety of simpler proteins (Figure 4), but it has not yet been demonstrated for even a simple mixture of glycoproteins with complete characterizations. Figure 4 illustrates the c- and y-fragment ions coming from the intact protein, with confirmation of the N- and C-termini peptides, using ISD on a MALDI-TOFMS type MS system (43). However, the upper MW range of TDS is today perhaps too low for most glycoprotein variants of biotech interest (31,42–50). It is undoubtedly a matter of time before these limitations will be overcome, and then there will be no reason whatsoever that mixtures of intact glycoproteins cannot be fully characterized by TDS alone (42–50).

Figure 4: Top-down protein sequencing of proteins without bottom-up analysis (43). Sample A: reISD spectrum annotated by fragment ions after successful assignment of the established protein sequence. BioTools confirms the N- and C-termini on the intact protein (43).

So, then, how does TDS really work? It works similarly to doing complete glycopeptide sequencing, via a combination of CID, ECD, or ETD and other fragmentation methods (for example, ISD), that will provide three things: amino acid sequence, location of each glycan on specific amino acids, and the precise structure of each glycan. This has been described several times by various groups for glycopeptides, but remains to be described for intact glycoproteins, using only TDS. This is basically because it is not yet possible using MS-MS alone, whatever the fragmentation mechanism, to totally sequence the entire amino acid backbone. The protein fragments in the MS and basically the N- and C-termini are sequenced, but putting back together the other peptides is often impossible. The coverage possible appears to vary, protein-to-protein and method-to-method. Thus, in most instances, a combination of TDS and bottom-up peptide mapping is still needed to obtain the total sequence, location, and nature of each modification. As one of our colleagues has so aptly put it: "Top-down only workflows do not deal with the fact that proteins have low levels of microheterogeneities that, left unresolved, are not accessible by top-down methodologies. Top-down can tell you where something of high occupancy resides on a protein, but seldom addresses the necessary quantitative or dynamic range requirements needed for biotherapeutic characterization work today" (51).

In bottom-up glycoprotein analyses, the intact protein is enzymatically digested, as usual, to form a mixture of glycopeptides (40,41,45–50). These are then usually resolved by reversed-phase LC–ESI-MS, and using a combination of CID and ETD or ECD (or other MS methods), depending on the specific MS system employed, information is derived for the peptide sequence, location of each glycan attachment, and the specific glycan structure (45–50). It still remains to put those glycopeptides together with other peptides, nonglycosylated, back together again into the original protein's correct sequence. Figure 5 then illustrates the type of information obtained by both CID and ETD or ECD, for typical glycopeptides, now showing mainly the glycan sequences and structures observed for this single species (47). However, this figure does not show the actual glycan sequencing data but only shows the final, intact mixture of glycopeptide variants detected. Glycan sequencing routines are common today, and there are various software platforms and databases (for example, GlycoSuite [Tyrian Diagnostics Ltd., Sydney, Australia] and others) that assist in such delineations. The amino acid sequence and location of glycan was done in a separate MS experiment. Each MS vendor has a somewhat different approach to doing glycopeptide characterization, as above, depending on the nature of their MS products. Some use IMS-TOF–TOFMS (Waters), others may use an Orbitrap instrument with CID/ETD(ECD) (Thermo Fisher Scientific), while still others may use an ion-trap with ETD and proton transfer reaction (PTR) (Bruker) methods.

Figure 5: Complete sequencing of a typical glycopeptide in a peptide map, using CID with ETD (47). Deconvoluted full MS spectrum of bovine alpha1-acid glycoprotein bi-antennary peptide 91CVYNCSFIK99.

Let us consider another, quite recent, such approach for doing top-down posttranslational modification (PTM) characterization, that described by Bruker, which uses its HCUltra ion trap instrument with ECD and PTR reactants (42,43,52,53). In this approach, intact proteins can be identified and sequenced without any prior enzymatic digestion. However, the sequence coverage is not 100%, and the specific sites of phosphorylation for the parent protein, pH4 histone, were not fully and absolutely identified. However, fragments all across the 100–11,331 Da of the protein are visible, with a sequence coverage of >80%. It does not appear that other approaches can yet 100% characterize complex glycoproteins, using top-down methods alone, no matter what MS instrument is available. When that does come about, it remains to be seen how the regulatory agencies will respond in future characterization requirements for complex mixtures of glycoproteins in new filings?

However, some very recent posters by Bruker at the ASMS 2010 meeting describe complete or very nearly complete, amino acid sequencing of fairly large proteins (52,53). In the case of a 1.36-kDa protein, camelid nanobody, it was completely sequenced, N- to C-terminal (52). This protein was not present in any known database, and thus did not lend itself to conventional, bottom-up methods. This is perhaps the most complete large-protein sequencing described to date, but again there was no indication of modifications, their location or specific nature.

Finally, What About Doing Just Shotgun Proteomics or Bottom-Up Methods for Glycoprotein Characterization?

For many years, still today, most proteomics researchers have tried to identify individual proteins or glycoproteins based on shotgun methods, also known as bottom-up alone (57,58). Indeed, we would conjecture that most of the existing literature in proteomics areas has relied and still relies on solely bottom-up methods. That is, most researchers do not perform both top-down and bottom-up proteomics at the very same time on the same proteins (40,41). The world today relies almost entirely on bottom-up peptide mapping methods, together with database searching, Mascot or MOWSE scores and percent coverage numbers to "identify" proteins. This is nonsensical. Bottom-up methods rely almost entirely on identifying "all" the peptides derived from the original, intact protein of interest, be this after 2DGE or MDLC separation methods. However, it is impossible to know when one has indeed detected "all" the possible peptides coming from an unknown parent protein. The absence of evidence is not evidence of absence.

And thus, when the database assigns a protein as a hit, it gives a score or number, which only suggests that may or may not be the right parent protein structure. That confidence can vary from 20% to 99%, but even a scoring of 99% can still be misleading or downright wrong. Without measuring the intact protein's true molecular weight by top-down MS methods, one never has true confirmation of what the database is suggesting as the "most likely" hit. Hence, it is very possible, if not likely, that much of the existing literature in proteomics areas may have incorrectly and inadvertently misidentified the true protein structure. As a speaker at the recent ISPPP 09 meeting cleverly put it: "Shotgun proteomics is full of holes." This would be especially true for glycoproteins, given the diversity and complexity of the possible component glycopeptides in a large parent glycoprotein. Hopefully, this situation should or shall change for the better in the near future.

And, to finish (at last!), clearly to characterize a biotechnology-derived glycoprotein using only bottom-up methods is also a doubtful approach to come up with the correct structure. It is a very risky and questionable approach, even when we know the original DNA sequence that was installed in the cell expression system to generate a known sequence of amino acids for the glycoprotein of interest. That is not very helpful once there are PTMs that occur between the protein's translation and final expression for identification, often involving assimilation (combining peptides from differently expressed proteins in the cell), truncation or clipping (enzymatic removal of pieces of the originally expressed protein structure) or amino acid changes, just to mention a few possible PTMs.

Table I summarizes a variety of methods used in the detection, purification, and structural analysis of glycoproteins (63). However, this is far from complete, as there are numerous other electrophoretic methods, for example, that have become extremely useful, such as HPCE, 1DGE, 2DGE, and others.

Table I: Some important methods used to study glycoproteins

Ira S. Krull "Biotechnology Today" Co-Editor Ira S. Krull is an Associate Professor of chemistry at Northeastern University, Boston, Massachusetts, and a member of LCGC's editorial advisory board.

Anurag S. Rathore is a biotech CMC consultant and an associate professor with the Department of Chemical Engineering, Indian Institute of Delhi, India. He is also a member of BioPharm International's editorial advisory board.


(1) The Future of Biotech. The 2010 Guide to Emerging Markets and Technology, BioWorld Today Publishers, 2010.

(2) Massachusetts Biotechnology Council, Cambridge, MA,

(3) BIO, Biotechnology Industry Organization,

(4) Time Magazine, pp. 36–39 (November 2, 2009).

(5) BioTerminology, a Guide to the Biopharmaceutical Lexicon, First Edition (Waters Corporation, Milford, Massachusetts, February, 2009).

(6) Scientific American WorldView, A Global Biotechnology Perspective, The Future of Biotech, The 2010 Guide to Emerging Markets and Technology, BioWorld Today (Scientific American, Inc., Publishers, 2009.

(7) C.T. Walsh, Posttranslational Modification of Proteins. Expanding Nature's Inventory (Roberts and Company, Publishers, Englewood, Colorado, 2006).

(8) G. Walsh, Biopharmaceuticals, Biochemistry and Biotechnology, Second Edition (John Wiley & Sons, Ltd., West Sussex, UK, 2003), Chapter 10.

(9) J. Geigert, The Challenge of CMC Regulatory Compliance for Biopharmaceuticals (Kluwer Academic/Plenum Publishers, New York, 2004), Chapters 1-2, Table 2, Chapter 1.


(11) T.E. Creighton, Proteins- Structures and Molecular Properties, Second Edition (W.H. Freeman and Company, New York, 1984).

(12) Techniques in Glycobiology, R.R. Townsend and A.T. Hotchkiss, Jr., Eds. (Marcel Dekker, New York, 1997).

(13) A Laboratory Guide to Glycoconjugate Analysis, P. Jackson and J.T. Gallagher, Eds. (Birkhauser Verlag, Basel, CH, 1997).

(14) K. Waddell and R. Gudihal, Current Trends in Mass Spectrometry, 15 (March, 2010),

(15) N. Tang, P. Goodley, and J. Michnowicz. American Biotechnology Laboratory, 25(3), 10 (2007).

(16) J. Wen, Y. Jiang, and L. Narhi, American Pharmaceutical Review10(6), 10 (2007).

(17) D. Goldberg, M. Bern, S. Parry, M. Sutton-Smith, M. Panico, H.R. Morris, and A. Dell, J. Proteome Res. 6, 3995 (2007).

(18) R. Sasisekharan, R. Raman, and V. Prabhakar, Annu. Rev. Biomed. Eng. 8, 181 (2006).

(19) P. Hongsachart, R. Huang-Liu, S. Sinchaikul, F-M. Pan, S. Phutrakul, Y-M. Chuang, C-J. Yu, and S-T. Chen, Electrophoresis, 30, 1206 (2009).

(20) Z. Dai, J. Zhou, S.-J. Qiu, Y-K. Liu, and J. Fan, Electrophoresis 30, 2957 (2009).

(21) R.V. Cordoba-Rodriguez, BioPharm Intl., 18 (November 2008).

(22) S.A. Berkowitz, H. Zhong, M. Berardino, Z. Sosic, J. Siemiatkoski, I.S. Krull, and R. Mhatre, J. Chrom., A 1079, 254 (2005).

(23) T. Barth, M.daS. Sangoi, L.M. da Silva, R.M. Ferretto, and S.L. Dalmora, JLC&RR 30, 1277 (2007).

(24) O. Salas-Salerno, B. Tomlinson, S. Du, M. Parker, A. Strahan, and S. Ma, Anal. Chem. 78, 6583 (2006).

(25) The Boston Globe, Saturday, November 14, 2009 edition,

(26) Protalix, Ltd., Carmiel, Israel,


(28) G. Walsh, Nature Biotech. 24, 769 (2006).

(29) Biopharmaceuticals- current market dynamics and future outlook.;

(30) B.H. Clowers, E.D. Dodds, R.R. Seipert, and C.B. Lebrilla, J. Proteome Res. 6, 4032 (2007).

(31) X. Han, M. Jin, K. Breuker, and F.W. McLafferty, Science 314, 109 (2006).

(32) M. Macht, Bioanalysis 1(6), 1131 (2009).

(33) H. Xie, M. Gilar, and J.C. Gebler, Anal. Chem. 81, 5699 (2009).

(34) R.H. Perry, R.G. Cooks, and R.J. Noll, Mass Spectrometry Reviews 27, 661 (2008).

(35) Application Note #LCMS-52, Bruker Daltonics Corporation, Billerica, Massachusetts; Application Note #MT-96, Bruker Daltonics Corporation, Billerica, Massachusetts.

(36) Application Note #TN-31, Bruker Daltonics Corporation, Billerica, Massachusetts.

(37) L.C. Santora, I.S. Krull, and K. Grant, Anal. Biochem. 275, 98 (1999).

(38) L.C. Santora, I.S. Krull, and K. Grant, Spectroscopy 17(5), 50 (2002).

(39) Multidimensional Liquid Chromatography: Theory and Applications in Industrial Chemistry and the Life Sciences, S.A. Cohen and M.R. Schure, Eds. (J. Wiley & Sons, Hoboken, New Jersey, 2008).

(40) K. Millea, I.J. Kass, I.S. Krull, J.C. Gelber, and S.J. Berger, J. Chromatogr., A 1079, 287 (2005).

(41) S.J. Berger, K.M. Millea, I.S. Krull, and S.A. Cohen. "Middle-Out Proteomics: Incorporating Multidimensional Protein Fractionation and Intact Protein Mass Analysis as Elements of a Proteomic Analysis Workflow," in Separations in Proteomics, G. Smejkal and A. Lazarev, Eds. (Taylor and Frances, Publishers, New York, 2006), Chapter 21.

(42) X.L. Zu, A. Schneider, M. Lubeck, C. Stacey, A. Ingendoh, C. Baessmann, and A. Imhof, ABRF 2008, Poster V57-T, Bruker Daltonics Corporation, Billerica, Massachusetts.

(43) Application Note #MT-90, Bruker Daltonics Corporation, Billerica, Massachusetts.

(44) Detlev Suckau and Anja Resemann, Anal. Chem. 75, 5817 (2003).

(45) P. Olivova, W. Chen, A.B. Chakraborty, and J.C. Gebler, Rapid Commun. Mass Spectrom. 22, 29–40 (2008).

(46) W. Chen, P. Olivova, C.E. Doneanu, and J.C. Gebler, American Society for Mass Spectrometry National Meeting, 2007, Session: IMS, Applications Poster 402.

(47) T. Zhang, R. Viner, Z. Hao, and V. Zabrouskov, Application Note: 463, Thermo Fisher Corporation, San Jose, California, 2009.

(48) W.R. Alley, Jr., Y. Merchref, and M.V. Novotny, Rapid Commun. Mass Spectrom. 23, 161 (2009).

(49) C.W.N. Damen, W. Chen, A.B. Chakraborty, M. van Oosterhut, J.R. Mazzeo, J.C. Gebler, J.H.M. Schellens, H. Rosing, and J.H. Beijnen, JASMS 20, 2021 (2009).

(50) A.B. Chakraborty, W. Chen, and J. Mazzeo, "Unraveling protein modifications via top-down fragmentation and ion-mobility time-of-flight mass spectrometry," Poster presented at 2010 ASMS Meeting.

(51) S. Berger, personal communication, June 2010.

(52) G. Shi, A. Resemann, D. Wunderlich, J. Fuchser, and D. Suckau, Poster MPO3-065, ASMS Meeting, 2010, Bruker Daltonics Corporation, Billerica, Massachusetts.

(53) A. Resemann, W. Evers, D. Suckau, T.T. Razunguzwa, and G.R. Asbury, Poster MP03-066, ASMS Meeting, 2010, Bruker Daltonics Corporation, Billerica, Massachusetts.

(54) G. Walsh, Post-translational Modification of Protein Biopharmaceuticals (Wiley-VCH Verlag GmbH, Weinheim, Germany, 2009).

(55) Experimental Glycoscience: Glycochemistry, N. Taniguchi, A. Suzuki, Y. Ito, H. Narimatsu, T. Kawasaki, and S. Hase, Eds. (Springer, Berlin, Germany, 2008).

(56) Pharmaceopeial Forum, 36(2) (March-April, 2010), USP, Rockville, Maryland (draft paper).

(57) Proteomics, Methods and Protocols, J. Reinders and A. Sickmann, Eds. (Humana Press, Totowa, New Jersey, 2009).

(58) R. Westermeier, T. Naven, and H.-R. Hopker, Proteomics in Practice: A Guide to Successful Experimental Design, Second Edition (Wiley-VCH Verlag GmbH, Weinheim, Germany, 2008).

(59) I.S. Krull and A.S. Rathore, LCGC North America 28(6), 454 (2010).

(60) B.S. Kendrick, G. Chrimes, S.L. Cockrill, J.P. Gabrielson, K.K. Arthur, B.D. Prater, Q. Qin, B. Zhang, and A.S. Rathore, BioPharm Intl., 32–44 (August 2009).

(61) A.S. Rathore, Trends in Biotechnology 27, 698 (2009).

(62), Wikipedia contributors. Glycoprotein. Wikipedia, The Free Encyclopedia. April 27, 2010, 20:54 UTC. Available at: Accessed June 20, 2010. Content is subject to the Creative Commons Attribution-ShareAlike 3.0 Unreported License.

(63) Essentials of Glycobiology, Second Edition, A. Varki, R.D. Cummings, J.D. Esko, H.H. Freeze, P. Stanley, C.R. Bertozzi, G.W. Hart, and M.E. Etzler, Eds. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2009).

(64) M.E. Taylor and K. Drickamer, Introduction to Glycobiology, Second Edition (Oxford University Press, Oxford, UK, 2006).