I arrived at Michigan State University in August 2011 with a vision, and long-term research goal, to design, engineer, and build the biomolecular components of life de novo. This overarching goal not only satisfies one of the grand challenges of biology – “is our understanding of living matter sufficient to redesign cellular life?” - but also has implications in several National Academy of Engineering grand challenges in the 21st century, including engineering better medicines and engineering the tools of scientific discovery. As a near-term research goal, my group is building on our vision of engineering biology by developing cutting edge methods to map protein sequence to function for medical and industrial biotechnology.

Proteins are central to a diverse set of fundamental activities necessary for cellular life. This same versatility leads to tremendous potential for proteins as designable agents in medicine (e.g. monoclonal antibodies like Rituximab that can target and destroy non-Hodgkin lymphomas) and in industry (e.g. enzymes that can deconstruct cellulosic biomass to simple carbohydrates). Yet natural proteins are not always optimal for such applications, and current methods of improving proteins can be expensive and laborious. My medium-term goal (10-20 years) is to design or engineer proteins for new/enhanced functions at will. My short-term goals (3-8 years) are to develop methods in support of our medium- and long-term goals.

My group’s current research interests lie in three overlapping yet distinct avenues: (I.) design and engineering of proteins as diagnostics, therapeutics, and vaccines; (II.) design and engineering of enzymes to deconstruct renewable biomass to value-added fuels/chemicals; and (III.) new methods development to enable large-scale and comprehensive analysis of protein libraries.

(I.) Design and engineering of proteins as diagnostics, therapeutics, and vaccines.  

Proteins that bind other proteins (like antibodies) can be used as diagnostic probes, therapeutics, or prophylactics. While I was a postdoc, I developed small proteins that broadly neutralize Influenza group I viruses by targeting a conserved stem region of Hemagglutinin (HA). I was able to determine the binding and stability of nearly every single point mutant of these yeast-displayed protein-binding variants by coupling deep sequencing to flow cytometry-activated sorting. I integrated these sequence-function maps with computational protein design to improve the affinity, specificity, and function of these proteins[1].

Members of my group realized that the same strategy of comprehensively mapping a protein sequence to its function could be used to identify conformational epitopes for antibody-antigen interactions. Our recent papers demonstrate that this conformational epitope mapping strategy, confirmed using antibody panels against diverse antigens,            is much more powerful than competing methods[2]. We have also used deep sequencing to quantitatively determine the effects of binding affinity upon mutation for nearly every possible single point mutant in two different protein sequences[3]. Combined, these methods allow the routine determination of affinity and specificity for all single point mutants for a given protein-protein binder in a massively parallel fashion, thus enabling more efficient engineering of proteins as therapies and vaccines. As one striking supporting example, a recent report detailed use of our methodological advances to design and engineer an immunogen for a potential HIV vaccine[4].

We are currently applying these methods to develop specific diagnostic reagents for Zika and related Flaviviruses, to map neutralizing antibodies for Zika and Dengue in order to create pan-Flavivirus prophylactics and vaccines, and to map epitopes for therapeutic antibodies for multiple indications. NIH, NSF, and industrial sponsors fund this work.

(II.) Enzyme engineering by deep sequencing and computational design.

We have extended high-resolution sequence function mapping to evaluate the comprehensive sequence determinants to pathway flux in a pyrolysis oil catabolic pathway in bacteria[5]. We identified the function of over 8,000 single point mutants in a single experiment and found that most mutations that improved pathway productivity worked by improving the active concentration of enzyme in vivo. Nevertheless, of the hundreds of mutations that improved function over 50 increased the catalytic efficiency of a key catabolic enzyme. We then integrated deep sequencing with computational design to support a 15-fold improvement in growth rate for bacteria grown on a pyrolysis oil anhydrosugar as the sole carbon source. We have also used this deep sequencing pipeline to understand the comprehensive determinants to substrate specificity for an enzyme, revealing that globally beneficial mutations are rare and that enzyme specificity is globally encoded in protein sequence and structure space[6].

We remain interested in applying computational design to uncover rational design rules for biomolecular engineering. As one example, lignocellulosic biomass can be converted to biofuels by enzymatic deconstruction of polysaccharide to fermentable sugars, followed by microbial fermentation. However, the enzymes (cellulases) needed in this process are expensive in part because they bind non-productively to and inactivate in the presence of lignin. To understand the basis of this protein-lignin inactivation, we used computation to design proteins a wide range of hydrophobicity, net charge, and charge density. We found that negative net charge is the single largest determinant to protein-lignin binding; we used that information to redesign more stable, active, and cheaper cellulases[7]. As another example, we computationally redesigned bacterial outer membrane proteins to reveal thermodynamic design principles behind membrane protein assembly and folding[8].  

(III.) Improving deep sequencing methods for protein engineering and biotechnology.

Methodological advances in deep sequencing developed by my laboratory, along with other labs worldwide, facilitated the above examples. With commercial next generation (deep) sequencers it is now possible to read sequences of a million DNA nucleotides for pennies, enabling one to sequence entire populations of similar biomolecules before and after a selection for function. The frequency change of each member of a population can be converted to a relative function. However, before integrating this technology with protein engineering there were a number of technical challenges that needed to be overcome. To that end, my group invented new methods to allow the technique to be performed easier for full-length proteins (on the order of 400 amino acids), improved several steps to decrease costs, and developed rigorous analytical normalization equations to correlate sequencing counts to binding dissociation constants (for protein binders) or relative growth rates (for enzymes)[9]. Additionally, a robust and accessible method for the construction of high quality, user-defined mutational libraries was lacking. Commonly used mutagenesis methods such as error-prone PCR suffer from limited codon sampling and imprecise control over the number of mutations introduced. To solve this challenge, we invented a new mutagenesis method to allow user-defined, comprehensive mutagenesis libraries from routinely prepped plasmid dsDNA in a single day and single pot[10].

Finally, we overcame a technical challenge for counting the frequency of library members in a population. In our specific applications we need the ability to identify sequence variants with accuracy. However, the short-read assembly paradigm currently dominates genomics, and the loss of linkage information during the generation of short reads limits their utility. We developed a library preparation method that enable long “synthetic” reads up to 11.6 kilobases in length to be constructed from conventional short (150-bp) reads, providing a general platform for synthetic read generation from a wide range of input nucleic acid types. We demonstrated that this method could resolve multiple splice junctions of individual RNA molecules, differentiate between distinct HIV Env variants, and improve the genome assembly of different organisms[11].

The Future is Bright.

My research group is applying our core technologies to a number of pressing applications. For example, we are beginning work on the structure-based design of effective vaccines for human and livestock infectious diseases (like Dengue), as immunogen design is at heart a protein engineering challenge. We are also excited about possibilities in remodeling protein-protein interactions in areas like T-cell immunotherapy and rational modification of the gut microbiome. In the area of industrial biotechnology, we anticipate extending our techniques to optimize entire synthetic metabolic pathways concurrently. For example, we are working on general approaches to simultaneously construct highly efficient, active enzymes for in vivo biomanufacturing. In a new vein, we are collaborating with Plant Biologists to develop new tools to enable synthetic biology of Plants.

[1] Whitehead TA et al. (2012) Nature Biotechnology; Whitehead TA* et al. (2013), Methods Enzymolog. US Patents 8,756,686 (2014); 9,181,300 (2015).

[2] Kowalsky CA, …., Whitehead TA* (2015) Journal of Biological Chemistry; Wang X, …., Whitehead TA, Maynard J Biochemistry 2017

[3] Kowalsky CA, Whitehead TA* (2016) Proteins

[4] J Jardine et al, (2016) Science. I was thanked in acknowledgements.

[5] Bacik JP, Klesmith JR, Whitehead TA, … (2015) Journal of Biological Chemistry; Klesmith JR, … Whitehead TA* (2015) ACS Synthetic Biology; Klesmith JR, Whitehead TA* (2016) Technology. Funded by USDA and NSF.

[6] Wrenbeck EE, …., Whitehead TA* (2016) under revision; Bienick MS,….,Whitehead TA* (2014) PLoS ONE

[7] Gao D, …., Whitehead TA,… (2014) Biotech for Biofuels; Haarmeyer C,…,Whitehead TA* (2016) Biotech Bioeng. Whitehead TA*… under revision (2016). I am the lead PI on this NSF-funded project.

[8] Stapleton JA, Whitehead TA,….(2015) PNAS. This was an NIH-funded project.

[9] Kowalsky CA, …, Whitehead TA* (2015) “High-resolution sequence-function mapping of full proteins”, PLoS ONE

[10] Wrenbeck EE, …, Whitehead TA*, (2016) Nature Methods

[11] Stapleton JA, …, Whitehead TA* (2016) PLoS ONE; US patent application 14947988 (2015)