ClipsMS: An Algorithm for Analyzing Internal Fragments Resulting from Top-Down Mass Spectrometry


Top-down mass spectrometry (TD-MS) of peptides and proteins results in product ions that can be correlated to the polypeptide sequence. Joseph Loo and colleagues at the University of California-Los Angeles (UCLA) have developed ClipMS, an algorithm assigning both terminal and internal fragments generated by TD-MS fragmentation, which can be used to locate various modifications on the protein sequence. Loo, who recently spoke to us about this work, is the 2022 recipient of the ANACHEM Award, presented annually at to an outstanding analytical chemist based on activities in teaching, research, administration or other activity which has advanced the art and science of the field. This interview is part of an ongoing series of interviews with the winners of awards that are presented at the annual SciX conference, which will be held this year from October 2 through October 7, in Covington, Kentucky.

Your paper (1) describes top-down mass spectrometry (TD-MS) of peptides and proteins. Please describe this technique for our readers.

Mass spectrometry (MS) has emerged as an important technique to characterize proteins in a rapid and sensitive manner. The polypeptide sequence of proteins can be derived by two different MS-based strategies. Proteins can be digested chemically or enzymatically into smaller peptides, and the resulting peptides are further fragmented in the mass spectrometer; the resulting tandem mass spectra are interpreted to yield the sequence and identity of the proteins. This is the bottom-up MS (or proteomics) strategy. Alternatively, using the top-down MS route, the initial digestion step is bypassed and the intact whole proteins can be directly fragmented in the mass spectrometer to generate sequence-informative product ions. Conceptually, TD-MS is simpler than bottom-up MS, but experimentally it’s more challenging currently because fragmenting larger molecules (proteins) is less efficient than for smaller peptides, resulting in less sequence coverage, and also MS sensitivity for large protein analysis is less than it is for peptides.

However, TD-MS is among the most efficient means to detect all protein post-translational modifications (PTMs) in a given proteome. As we described in a recent publication (2), proteins from even a single gene can vary widely in their amino acid sequence and PTMs giving rise to a variety of proteoforms. TD proteomics analyzes the entire intact proteoform and is the most powerful proteoform-level analysis technology in existence.

In your paper, you introduce ClipsMS, an algorithm that can assign both terminal and internal fragments generated by TD-MS fragmentation. How is this algorithm unique and why was there was a need for such an algorithm? What are the benefits of using this algorithm over previous approaches?

I can’t say that there was a need for a program to assign internal fragments detected in tandem mass spectrometry of polypeptides. It always bugged me that a protein top-down mass spectrum would have lots of peaks that were unassigned after running it through the programs that are commonly used in TD-MS. We as a community publish papers that report the information derived from the assigned peaks in mass spectra, yet the unassigned peaks are typically ignored. That’s the dirty little secret in protein mass spectrometry—not all peaks in tandem mass spectra can be easily assigned to a structure. To me, that’s like throwing data away. I wanted to see if we can fix this situation somehow, at least for TD-MS.

Actually, the inspiration for this project started several years ago when a former postdoc, Dr. Huilin Li, who is now a professor at Sun-Yat Sun University in Guangzhou (China) and a former graduate student, Dr. Hong Hanh Nguyen, discovered that TD-MS of some large protein complexes yielded fragments that could only be assigned as two bond cleavage product ions (3). We started to think about how to construct a program to assign potential internal fragments found in TD mass spectra. Terminal fragments result from a single bond cleavage along the polypeptide backbone, resulting in a product ion that contains either the N- or C-terminus of the protein. Internal fragments are generated by two bond cleavages on the backbone, usually from an additional dissociation event after a terminal fragment is created (we believe) and does not contain either the N- or C-terminus. The number of potential internal fragments increases exponentially with increasing number of amino acids in the sequence, and so the large number of possibilities for typical proteins and matching these to the raw data can be challenging.

Fortunately, many of my students, especially the co-authors of our paper, took computer programming classes offered at UCLA. This is such a wonderful skill to have in any science discipline, and I recommend this for all science students (the last computer class I took was Fortran 77, and so I’m considered to be a dinosaur these days; thank goodness for my students). The students took up the challenge to use their new skills here.

The other event that accelerated this project was, sad to say, was the Covid pandemic. We had just started to explore whether internal fragments are indeed present in our TD-mass spectra. The group debated on ways to confirm their identities and whether the assignments were statistically valid. Then the world shut down in March 2020, and we were no longer able to come to campus and the labs because of Covid. Those were uncertain times for everyone, especially for the students. No one knew exactly when we could return to the labs and offices. Like many science groups worldwide, we functioned by meeting online and wrote papers. And at the same time, the students worked on the program (coded in Python) and developed a GUI to input data into the program. And the rest is history, as they say. The program continues to develop as we include more options for including post-translational modifications, ligand binding, and more; and for displaying the results (that will be described in future publications).

The program is unique in the sense that there aren’t many computational tools to calculate internal fragments and match the mass spectra to the huge list of theoretical fragments. If one knows the identity and the sequence of the protein that is being measured, like in the case found in many biopharma labs or protein expression groups, then it’s entirely feasible to use the program to assign as many peaks in the spectrum as possible to increase the sequence coverage of a TD mass spectrum. But if you’re analyzing an unknown protein, like in the proteomics world, and you need to search sequences in an entire proteome, including the possibility of including all theoretical internal fragments might be computationally too intensive. For this reason (and many others), we are not advising the use of internal fragments for identifying proteins, but it would not surprise me that the developers of programs used in TD proteomics are considering including internal fragments (they’re much smarter than me, and so I’m sure they will come up with way to do it).

Briefly summarize how this algorithm functions and is able to deliver on its promise of fragment assignments.

Sorry, but I think I will only briefly summarize my answer to this question because a detailed response would require me to regurgitate the entire paper and subsequent papers to answer it. The algorithm isn’t particularly elegant, almost a brute force method of calculating all possibilities and searching the data for nearest matches. However, it seems to work in our tests. We can compare the data for a given protein and the same protein that has been modified at a single site. All of the appropriate terminal and internal fragments shift in mass according to the modification. Other labs who have tried the program have been giving us mostly positive feedback and some suggestions to improve it (and we thank them for this). And we have found that internal fragments are generated by using nearly all forms of dissociation methods employed by TD-MS, such as collision-, electron- and photon-based techniques (4).

Are there any challenges or limitations involved with utilizing ClipsMS? Are there any other applications where this algorithm might be useful?

Certainly, mass measurement accuracy (and resolving power) needs to be as high as possible to reduce mis-assigning internal fragments. For a 600-residue protein, the number of possible internal fragments with a minimum length of 4 residues approaches 2 million (more or less). Therefore, for a given peak in a tandem mass spectrum, the possibility of falsely matching to a theoretical internal fragment mass is high if the mass accuracy used is relatively low (5-10 parts-per-million, ppm). We try to maintain a mass accuracy of less than 3 ppm or even less than 1 ppm to be as conservative as possible when assigning internal fragments.

We’ve found that assigning internal fragments allows TD-MS to access regions of the protein not covered by terminal fragments. More sites of PTMs can be located in this fashion. And we’re thinking that disulfide bonds can be more easily mapped using TD-MS. Disulfide bonds are important because they help to preserve tertiary structure. As monoclonal antibodies (mAbs) and antibody drug conjugates (ADCs) become more popular as therapeutics, rapid methods for mapping S-S linkages and PTMs become more important. There could be well over 16 disulfide bonds in a given mAb. Our preliminary data using internal fragments and TD-MS is quite promising and suggests a TD-MS strategy to map the S-S bonds from proteins as complex as a mAb.

What sort of response has your paper gotten?

The response from the mass spectrometry community has been fairly positive so far (or at least those who hate it haven’t told me—yet). We make the program freely available on github, and labs from around the world have contacted us for the latest versions. People who have questions on how to use it will contact me, and then I refer them to my students (because, after all, I only know how to program in Fortran, and I haven’t done any real programing in over 30 years). The concept of internal fragments in peptide and protein tandem mass spectra is certainly not new by a longshot. The few previous reports on the topic certainly encouraged us to put some concerted effort into seeing if internal fragments can be exploited for TD-MS. The tool was developed for everybody to use. We developed it to make it easier to analyze our raw TD-MS data. I hope others use it and find ways to improve upon it, and I hope it encourages more labs to consider using TD-MS to characterize proteins.

The ANACHEM Award is presented annually to an outstanding analytical chemist based on activities in teaching, research, administration, or other activity which has advanced the art and science of the field. Please comment on the meaning of being 2022’s recipient.

When I first received notification that I was selected to receive the ANACHEM Award, of course I was humbled and honored. And then when I read the list of past recipients of the award, I was sure that this was a big mistake. Most of the giants of analytical chemistry are on the list. The first recipient was Hobart Willard (1953)—I still consult his textbook on “Instrumental Methods of Analysis” that’s sitting in my bookshelf when I teach undergraduate instrumental analysis. There’s I. M. Kolthoff (1961) and Rosalyn Yalow (1973) on the list. That will be a great trivia question: “What do I. M. Kolthoff, Nobelist Rosalyn Yalow (for the development of radioimmunoassay) and Joe Loo have in common?” And then the name that really sticks out for me is Fred McLafferty (1985), my Ph.D. mentor at Cornell University. Unfortunately, Fred passed away at the age of 98 this past December (2021). Fred is the reason why I became an analytical chemist. When I entered graduate school, I thought I wanted to be a biochemist. Fred showed me that analytical chemistry can impact nearly all areas of science, including biochemistry. He taught me the joys of staring at tandem mass spectra–the more peaks, the better—to reveal how the decomposition of gas phase ions can lead to interesting chemistry and can ultimately be used to deduce molecular structures. Broadly speaking, Fred taught me that analytical chemistry is fun! It’s like solving puzzles all of the time! I hope that I have been able to convey to my own students in my research labs and the students in my classes the joys of analytical chemistry and mass spectrometry. And with regards to my own work, I hope that my contributions have moved the “art and science” of mass spectrometry forward, even only a tiny bit, so that others in the future can build upon it. I am receiving this award on behalf of all of the past and current group members throughout my career, all of the people I have collaborated with, and the entire analytical chemistry community who have accepted me into their fold.


(1) C. Lantz, M.A. Zenaidee, B. Wei, Z. Hemminger, R.R. Ogorzalek Loo, and J.A. Loo, J. Proteome Res. 20, 1928–1935 (2021).

(2) L.M. Smith, J.N. Agar, J. Chamot-Rooke, P.O. Danis, Y. Ge, J.A. Loo, L. Paša-Tolić, Y.O. Tsybin, N.L. Kelleher, and The Consortium for Top-Down Proteomics, Sci. Adv. 7, eabk0734 (2021).

(3) H. Li, H.H. Nguyen, R.R. Ogorzalek Loo, I.D.G. Campuzano, and J.A. Loo, Nature Chem. 10, 139–148 (2018).

(4) M.A. Zenaidee, B. Wei, C.Lantz, H.T. Wu, T.R. Lambeth, J.K. Diedrich, R.R. Ogorzalek Loo, R.R. Julian, and J.A. Loo, J. Am. Soc. Mass Spectrom. 32, 1752–1758 (2021).

Joseph Loo

Joseph Loo

Joseph A. Loo is Professor of Chemistry and Biochemistry, and Biological Chemistry (David Geffen School of Medicine at UCLA) at the University of California, Los Angeles. He is also a member of UCLA/Department of Energy Institute for Genomics and Proteomics and the UCLA Molecular Biology Institute. He received his B.S. in chemistry from Clarkson University and his Ph.D. in analytical chemistry working with Professor Fred McLafferty at Cornell University. He was a Postdoctoral Fellow and Senior Research Scientist at Pacific Northwest National Laboratory with Dr. Richard Smith. Prior to joining UCLA in 2001, Dr. Loo was an Associate Research Fellow at Parke-Davis Pharmaceutical Research/Pfizer.

His group uses and develops new mass spectrometry (MS) and proteomics strategies, including top-down MS (TDMS), native MS, ion mobility MS (IM-MS), and label-free quantification methods, to characterize proteins and protein complexes (and their proteoforms) and for the elucidation of protein biomarkers to aid human health studies. The major current areas of emphasis include elucidating the importance of post-translational modifications, such as lysine acylation, to regulate enzymes and metabolic processes within microbial consortia. Native MS, TDMS, and IM-MS are being used to identify protein-protein and protein-ligand interactions related to neurodegenerative diseases, such as Alzheimer’s and Parkinson’s diseases. Using high resolution MS with various forms of dissociation methods, Loo’s group has extended TDMS to address larger proteins and protein complexes, including membrane proteins, to aid structural biology studies.

Dr. Loo has published over 340 papers and book chapters, and is currently on the Editorial Boards of several scientific journals, including Mass Spectrometry Reviews and Clinical Proteomics, and he is the Editor-in-Chief of the Journal of the American Society for Mass Spectrometry (JASMS). In July 2022, he will step down from his JASMS position to take a position on the Board of Directors of the American Society for Mass Spectrometry (ASMS) as the Vice-President for Programs. He has held leadership and advisory positions with scientific organizations, including ASMS and the US Human Proteome Organization (US HUPO), and his research has been supported by the US National Institutes of Health (NIH), the US National Science Foundation (NSF), the US Department of Energy (DOE), and the US Department of Defense (DOD).

Related Videos
Toby Astill | Image Credit: © Thermo Fisher Scientific
John McLean | Image Credit: © Aaron Acevedo
Related Content