Jekyll2023-12-31T17:25:05+00:00http://jck.bio/feed.xmlJacob C. Kimmelpersonal website2022 Best Books2023-02-06T00:00:00+00:002023-02-06T00:00:00+00:00http://jck.bio/best-books_2022<p><img src="http://jck.bio/assets/images/books/best_books_2022.png" alt="Covers of the best books I read in 2022." /></p>
<ul>
<li>The Power Law — Sebastian Mallaby</li>
<li>Working Backwards — Colin Bryar, Bill Carr</li>
<li>Guns, Germs, and Steel — Jared Diamond</li>
<li>Invention of Nature — Andrea Wulf</li>
<li>A Shot to Save the World — Gregory Zuckerman</li>
</ul>
<p>Much delayed, I’m happy to recommend the books below as the best I read in 2022. Last year, I moved into a new role to help start <a href="https://newlimit.com">NewLimit</a>. My literary diet shifted along with the contents of my workday, and I enjoyed exploring different organizational designs and funding structures for technological enterprises. I found both <em>The Power Law</em> and <em>Working Backwards</em> below through that focused search and learned a great deal from both. The remainder of my reading hours were spent indulging in a series of science fiction novels, classics I somehow hadn’t had a chance to read, and tales from the annals of science history that left meinspired to press against the boundary of human knowledge.</p>
<p>My top five favorites from the year are outlined below.</p>
<p>If these books seem interesting to you or you’d like to trade notes, please feel free to shoot me an email!</p>
<h2 id="the-power-law--sebastian-mallaby"><em>The Power Law —</em> Sebastian Mallaby</h2>
<p>The most impactful businesses of the past half-century have a nearly invariant commonality in their origin stories. Whether the business began in a garage, loft, dorm room, or basement laboratory each was nurtured into existence by Venture Capital. Alongside those businesses, impactful technologies that shape our world blossomed — from Intel’s silicon chips to Genentech’s biologic medicines.</p>
<p>Living in San Francisco for my whole adult life, venture <em>feels</em> like a storied, eternal institution — old as the Sequoias. In reality, the modern structure of a venture firm is scarcely older than some of the technology companies most associated with the asset class. In <em>The Power Law</em>, Mallaby tells the story of venture’s inception as “Adventure Capital,” growing out of family offices and a public holding company into the private partnerships that dominate the industry today. Mallaby reprises his formula from <em>More Money than God,</em> using a cast of the industry’s innovative characters to explain the origin of each feature in a modern firm.</p>
<p>While I don’t endorse every opinion it contains, <em>The Power Law</em> taught me a tremendous amount about an asset class with a larger impact per dollar than any other. I can’t recommend it highly enough to anyone interested in technology or finance.</p>
<h2 id="working-backwards--colin-bryar-bill-carr"><strong>Working Backwards — Colin Bryar, Bill Carr</strong></h2>
<p>The nearest grocery store and doctor’s office are both owned by the same company that made my television and the device I read this book on. Amazon is one of the most fascinating businesses in the world, somewhere between a high-technology firm, an old-school conglomerate, and a Sam Walton style discounter.</p>
<p>It seems borderline impossible that each of these diverse business lines can run on the same corporate operating system. And yet. As Bryar and Carr describe in <em>Working Backwards</em>, the entire Amazon empire operates using a shared set of principles and communication mechanisms, even as they differ in nearly every other aspect of their isolated businesses.</p>
<p>The Amazon Way is both a set of abstract leadership principles (including both Customer Obsession and Be Right, A Lot) and concrete management mechanisms (Narratives over slide decks, Press Releases as product plans, Single-threaded decision making). There is no one right way to run a business, and I disagree with some Amazonian principles or mechanisms, but on the whole I find the Amazon operating system incredibly compelling as a baseline for an efficient organization. Bryar and Carr are likely to become canonical references in the school of management, alongside Grove and Horowitz.</p>
<h2 id="guns-germs-and-steel--jeremy-diamond"><strong>Guns, Germs, and Steel — Jeremy Diamond</strong></h2>
<p>See full review: <a href="https://www.notion.so/Guns-Germs-and-Steel-e2a203ede7b34103bbab9498012f3e71">Guns, Germs, and Steel</a></p>
<p><em>Guns</em> is a classic that was first recommended to me more than 10 (!) years ago. It is a testament to either (1) the growth rate of my book list or (2) my sorting algorithm that I only now got around to reading a book I loved.</p>
<p><em>Guns</em> asks perhaps the biggest question in contemporary world history — how did a set of societies from a relatively small geographic area in Europe and the Mediterranean come to have such an outsized influence? Diamond reduces this complexity down to a set of highly plausible, if non-falsifiable hypotheses that emphasize the particular influence of geography on human flourishing and the outsized advantages enjoyed by Europe and Asia Minor during the nascent epochs of human development. There are few books that offer such a clarifying lens upon such a large question — a good explanation in the Deutsch-ian sense.</p>
<h2 id="invention-of-nature--andrea-wulf"><strong>Invention of Nature — Andrea Wulf</strong></h2>
<p>Throughout my life, I’ve noticed parks, municipalities, and awards named Humboldt. Never once did I imagine that each was an allusion to one visionary scientist, rather than a collection of references to a common German surname.Such has the star of Alexander von Humboldt faded in the North American consciousness. <em>Invention</em> touches a small spark to the kindling of Humbolt’s work and hopes to reawaken the memory.</p>
<p>Humbodlt was among the last of the old generation of scientists — passionate hobbyists who financed their endeavors with independent wealth or patronage, rather than professionals in an institution funded by government or corporate coffers. He pioneered our modern understanding of ecology, wrote naturalist travelogues that inspired the likes of Charles Darwin and John Muir, kept up correspondence with Thomas Jefferson and the leaders of several European nations — a list so long it is amazing that it fit into a life.</p>
<p>Most striking to me was that his career was built upon a single five year journey through Latin America, climbing the Andes and cataloging one of the world’s most biodiverse regions. These years were the spark of ideas and relationships that he spent the rest of his life expanding, akin to an <em>annulis mirabilis</em> on a grander scale. <em>Invention</em> offers not only the pleasure of following that journey, but an inspiration to venture further along arduous routes, so long as they end in alpine views.</p>
<h2 id="a-shot-to-save-the-world--gregory-zuckerman"><strong>A Shot to Save The World — Gregory Zuckerman</strong></h2>
<p>In January of 2020, I began reading news of a flu-like illness spreading in southern China. Until April of 2021, I lived with some degree of anxiety that the flu-like illness would harm me and my loved ones.</p>
<p><em>Shot</em> offers an explanation for the relatively shocking proximity of those two dates. Prior to the SARS-CoV2 pandemic, the record for the most rapid development of a vaccine stood at four years (see: <a href="https://en.wikipedia.org/wiki/Mumps_vaccine">mumps</a>). <em>Shot</em> recounts how the biopharmaceutical industry beat that record by nearly four-fold in 2020. It’s a story of emerging biotechnologies (see: mRNA, the molecule), young companies turned industry titans (see: MRNA, BioNTech), and countless individuals who worked interminably to render the horse of pestilence quiescent once more.</p>
<p>This is one of the most of the most inspirational stories of technological progress, an Apollo Program for our era. I couldn’t help but swell with pride to know that our species is capable of such feats.</p>Designing reprogramming therapies2022-08-12T00:00:00+00:002022-08-12T00:00:00+00:00http://jck.bio/designing-reprogramming-therapies<p><em>This is a cross-post from the <a href="https://blog.newlimit.com/p/developing-reprogramming-therapies">NewLimit Blog</a></em></p>
<p>We all experience a decline in health with age. Many common diseases of aging — immune dysfunction, muscle atrophy, and systemic fibrosis among others — have been so recalcitrant that we consider them inevitable.</p>
<p>At <a href="https://newlimit.com">NewLimit</a>, we’re developing medicines to treat age-related disease through a new therapeutic approach. While the tissues that make up our bodies age in different ways, we believe that therapies designed to reprogram the epigenome may unlock treatments for multiple diseases and increase the number of healthy years in each of our lives.</p>
<blockquote>
<p>See: <a href="https://blog.newlimit.com/p/announcing-newlimit-a-company-built?utm_source=%2Finbox&utm_medium=reader2">NewLimit — A company built to extend human healthspan</a></p>
</blockquote>
<p>How might these therapies work?</p>
<p>Your body is composed of a constellation of cell types that perform specialized functions, yet each of your cells contains the same DNA. The emergence of these diverse functions from a common genetic code is mediated by the epigenome, a set of modifications to DNA and associated proteins that control which genes are turned “on” and “off” in each cell.<em>**</em></p>
<p>Genes known as transcription factors coordinate the machinery that sets and remodels these epigenetic marks. Transcription factors have evolved to control genetic programs by binding specific sites in the genome and recruiting other protein machines to make changes to the epigenome, giving rise to distinct cell types and functions. The epigenome can be broadly remodeled by manipulating just a <em>small number</em> of transcription factors, enabling us to reprogram cells to adopt different identities and perform new functions.</p>
<p>We believe that these developmental programs can be repurposed as a new class of medicines.</p>
<h1 id="restoring-cell-function-by-partial-reprogramming">Restoring cell function by partial reprogramming</h1>
<p>What evidence is our belief based on?</p>
<p>A series of experiments have begun to demonstrate that epigenetic reprogramming may be employed to address age-related diseases. Even old cells can be reprogrammed back to a pluripotent, embryonic state, then developed into healthy young animals by activating only four transcription factors. Researchers have found that after reprogramming, some cellular features of aging are reversed <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Complete pluripotent reprogramming erases the identity and function of adult cells and is not a plausible therapy, but recent experiments suggest this biology may be harnessed by other means to address disease.</p>
<p>It has recently been shown that even transient activation of pluripotent reprogramming factors can reverse molecular and functional features of aging. Researchers have shown that this “partial reprogramming” process can restore healthy gene expression and cell phenotypes in old cells without permanently abolishing adult cell identity and function. Experiments in old and diseased animals have also shown that partial reprogramming can restore regenerative potential and provide therapeutic benefit in models of <a href="https://pubmed.ncbi.nlm.nih.gov/27984723/">metabolic disease</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/34035273/">muscle injury</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/34554778/">heart attacks</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/33268865/">glaucoma</a>, <a href="https://www.nature.com/articles/s43587-022-00183-2#Sec28">fibrosis</a>, and <a href="https://www.cell.com/cell-reports/fulltext/S2211-1247(22)00491-0">liver disease</a>.</p>
<p>While promising, the reprogramming methods used in these experiments are not readily translatable into therapies for humans. Partial reprogramming with pluripotency factors can induce neoplastic teratomas — tumor-like growths that are often lethal. Beneficial and dangerous doses of these pluripotent reprogramming interventions are often only 2-fold different.</p>
<p>Is there a way we can capture the benefits of partial reprogramming, while reducing the risks? Several groups have shown that alternative epigenetic programs can likewise restore youthful phenotypes in old cells, while reducing undesirable effects. Even reprogramming strategies that completely avoid risky pluripotency factors can provide benefit <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>.</p>
<p>At NewLimit, we’re building a discovery platform to engineer new epigenetic programs that can similarly restore youthful regenerative potential to address age-related disease, while minimizing risks.</p>
<h1 id="how-can-we-design-reprogramming-therapies">How can we design reprogramming therapies?</h1>
<p>Reprogramming interventions are traditionally designed by selecting a set of transcription factors using intuition, then testing to see if these factors can induce a small set of “markers” that correlate with a desired cell phenotype. These approaches have enabled the design of many reprogramming methods that convert between distinct cell types <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. Nonetheless, this traditional approach is limited by the use of coarse marker gene read-outs, the small experimental scales employed, and the heuristic nature of hypothesis generation.</p>
<p>NewLimit is building a technology platform that combines advances in single cell genomics, pooled perturbation screening, and machine learning to overcome these challenges. Each of these technologies has emerged only within the last decade, enabling a new approach to design reprogramming therapies.</p>
<ol>
<li><strong>Measuring reprogramming outcomes with single cell genomics:</strong> Nuanced changes in epigenetic state — like the difference between diseased and healthy cells of the same type — are rarely captured by a handful of marker genes. By using single cell genomics to measure reprogramming outcomes, we’re can move beyond marker genes and use rich measurements of cell state to evaluate interventions <em>and</em> perform more experiments than was traditionally possible.</li>
<li><strong>Pooled reprogramming screens:</strong> Pooled screening allows us to perform hundreds to thousands of experiments in the same population of cells, including combinations of reprogramming factors without burdensome molecular biology processes. Using these techniques, we can increase the number of reprogramming hypotheses we explore by orders of magnitude.</li>
<li><strong>Guiding epigenetic program design with machine learning:</strong> Even with advances in single cell genomics and pooled screening, there are far more possible reprogramming strategies than we can ever test experimentally <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Machine learning methods predict the outcomes of new experiments and allow us to search the experimental space intelligently, using data from past experiments to inform the selection of future experiments in a rigorous process.</li>
</ol>
<p>Taking inspiration from the “Design-Build-Test-Learn” framework common to engineering disciplines, we’re focused on improving the number of reprogramming hypotheses we can test, how much we learn from each, and integrating information across historical experiments so that each experiment informs the design of those to come.</p>
<p>We believe that this technology platform will transform the design of epigenetic programs from an artistic endeavor into an engineering discipline, enabling reprogramming discovery campaigns analogous to the small molecule and antibody campaigns that drive drug discovery today.</p>
<h1 id="ambitious-missions-require-excellent-teams">Ambitious missions require excellent teams</h1>
<p>The technologies that comprise our platform are necessary but not sufficient to realize our mission. The most critical component of the platform are the talented scientists and engineers who build and deploy it to discover new medicines. Our success depends upon these talented people more than any other variable.</p>
<p>NewLimit is now recruiting broadly across diverse fields of science, including single cell and functional genomics, immunology, computational biology, and machine learning. If this mission excites you, please reach out, even if none of our open roles are an exact fit for your talents.</p>
<p><strong>Apply now to build the future with us:</strong> <a href="https://www.newlimit.com/careers">newlimit.com/careers</a></p>
<hr />
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Beginning in the 1950s, John Gurdon performed a series of remarkable experiments where he transplanted the nuclei of mature frog cells into enucleated frog eggs (<a href="https://royalsocietypublishing.org/doi/epdf/10.1098/rspb.1970.0050">Gurdon 1970</a>). The egg cytoplasm contained signals that were sufficient to reprogram the adult nucleus back to an embryonic state, and these reprogrammed eggs gave rise to young frogs. Shinya Yamanaka’s group later showed this process could be achieved by activating just four genes in 2006 (<a href="https://pubmed.ncbi.nlm.nih.gov/16904174/">Takahashi & Yamanaka, 2006</a>). Gurdon and Yamanaka were jointly awarded the Nobel Prize for pluripotent reprogramming in 2007. Several researchers later found that somatic cells of different ages became highly similar after reprogramming back to a pluripotent state using Yamanaka’s method (<a href="https://pubmed.ncbi.nlm.nih.gov/22056670/">Lapasset et. al. 2011</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/26456686/">Mertens et. al. 2015</a>). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Researchers have found that smaller, less risky sets of pluripotency factors (<a href="https://pubmed.ncbi.nlm.nih.gov/33268865/">Lu et. al. 2020</a>, <a href="https://www.nature.com/articles/s43587-021-00109-4">Neumann et. al. 2021</a>, <a href="https://www.cell.com/cell-systems/fulltext/S2405-4712(22)00223-X?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS240547122200223X%3Fshowall%3Dtrue">Roux et. al. 2022</a>) and alternative partial reprogramming factors can also provide benefit (<a href="https://www.nature.com/articles/s43587-022-00209-9">Ribeiro et. al. 2022</a>, <a href="https://www.cell.com/cell-systems/fulltext/S2405-4712(22)00223-X?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS240547122200223X%3Fshowall%3Dtrue">Roux et. al. 2022</a>). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Hal Weintraub’s laboratory first discovered that epigenetic reprogramming could convert skin <a href="https://pubmed.ncbi.nlm.nih.gov/3690668/">fibroblasts into muscle cells all the way back in 1987</a>. Researchers have since found routes to convert fibroblasts into <a href="https://pubmed.ncbi.nlm.nih.gov/22522929/">cardiomyocytes</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/30530727/">immune dendritic cells</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/21562492/">hepatocytes</a>, <a href="https://www.notion.so/f7ae9568b319416eb8fb9b05126e8bbb">renal tubule cells</a>, <a href="https://www.notion.so/3e0dcc80ccfb46d2a8f06a9665659cae">neurons</a>, and many other cell types. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Even with a small set of 50 possible reprogramming factors, there are >10,000,000 possible combinations of six or fewer factors to test! <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>This is a cross-post from the NewLimit Blog2021 Best Books2021-12-30T00:00:00+00:002021-12-30T00:00:00+00:00http://jck.bio/best_books_2021<p>In 2020, I learned the most from reading historical accounts of scientific progress and funding, particularly in my field of biotechnology.
For 2021, I set a goal to cover a broader swath of the history of biomedical research paired with some longer-form non-fiction in business and economics.
As always, I also kept up a steady intake of science fiction.</p>
<p>I’ve summarized a few favorites I can strongly recommend below.
If these sound interesting to you, I’d be happy to hear any related recommendations <a href="mailto:jacob@jck.bio">by email!</a></p>
<!-- Finding high-quality, non-academic literature in the first category was surprisingly hard!
Ultimately, I read through a few different biographies of life scientists ([*Hood*](https://www.notion.so/jacobkimmel/Hood-Visionary-of-the-Genomics-Age-842daa952f97488384f69e50dd39843c), [*Time, Love, Memory*]()), a few biotech industry stories (*Breath from Salt*, *The Great American Drug Deal*), and histories of academic molecular biology and genomics ([*The Eighth Day of Creation*](https://www.notion.so/jacobkimmel/The-Eighth-Day-of-Creation-787948ef203141a5a21be1620fcfee31), [*Drawing the Map of Life*](https://www.notion.so/jacobkimmel/Drawing-the-Map-of-Life-d284d0aef1b24703a18d7cf3c879182f)). -->
<h2 id="breath-from-salt">Breath from Salt</h2>
<p><a href="https://www.greenapplebooks.com/book/9781948836371"><strong>Green Apple Books</strong></a></p>
<p>As I’ve <a href="http://jck.bio/her2/">opined before</a>, I think there are too few accessible accounts of how medicines are invented.
To my delight, <em>Breath from Salt</em> is one more entry in the small canon of drug development stories that I can recommend widely.</p>
<p><em>Breath</em> covers the first diagnosis of cystic fibrosis as a disease, the discovery of its molecular basis, and the various efforts to develop medicines that eventually resulted in <a href="https://en.wikipedia.org/wiki/Lumacaftor/ivacaftor">Vertex’s remarkably effective drugs.</a>
Trivedi seamlessly integrates the stories of diverse CF families, highly-technical biomedical science, and drug R&D to take readers on a complete journey from patient to medicine and back again.</p>
<p>The drug development story in particular is quite striking.
The Cystic Fibrosis Foundation proved pivotal as a source of <em>differentiated</em> funding for CF research and treatment development.
In particular, they used a unique model where the Foundation provided early stage, high risk capital for research and development of new therapeutics in exchange for a portion of the ensuing royalties.
They successfully deployed this model to first develop a series of symptomatic treatments, and later to fund a high risk small molecule screening campaign at Roger Tsien’s <a href="https://en.wikipedia.org/wiki/Aurora_Biosciences">Aurora Biosciences</a>.</p>
<p>This campaign was the first attempt to search for a “corrector” drug that rescued the ability of mutant protein to fold properly, rather than to inhibit protein activity like most small molecule therapies.
Given the absurdity of the task, Vertex almost killed the program when they acquired Aurora, and only due to early positive results obtained with the CF Foundation funding was the program allowed to continue.
Those efforts yielded the drugs that improved hundreds of thousands of lives, eventually helping the majority of CF patients and rescuing Vertex as a business when their <a href="https://en.wikipedia.org/wiki/Telaprevir">HCV drug</a> was disrupted by <a href="https://www.fiercepharma.com/sales-and-marketing/sovaldi-forces-incivek-off-hep-c-market-as-vertex-calls-it-quits">superior therapeutics</a>.</p>
<p>It’s a remarkable story that highlights just how narrow the pathway to success can be even for some of the most successful medicines.</p>
<h2 id="the-eighth-day-of-creation">The Eighth Day of Creation</h2>
<p><strong>Review:</strong> <a href="https://www.notion.so/jacobkimmel/The-Eighth-Day-of-Creation-787948ef203141a5a21be1620fcfee31">The Eighth Day of Creation</a><br />
<strong>Related Reflections:</strong> <a href="http://jck.bio/learning-representations-of-life/">Learning representations of life</a></p>
<p><em>Eighth Day</em> is perhaps the most complete historical account of molecular biology’s founding experiments and personalities.
Despite working in the field for more than a decade, I found myself consistently surprised to learn of motivations, models, and ideas lost in the usual retelling of molecular biology’s triumphs.
Horace Freeland Judson has a talent for communicating not just what we know about the molecules of life, not just how we came to know it, but the <em>intellectual evolution</em> or sequence of ideas that led to the key experiments at the basis of modern understanding.
Highly recommended for any fans of the history of science, progress, or biotechnology.</p>
<h2 id="exhalation">Exhalation</h2>
<p><a href="https://www.greenapplebooks.com/book/9781101972083"><strong>Green Apple Books</strong></a></p>
<p>In his second collection of stories, Ted Chiang cements his place as one of the twenty-first century’s most interesting science fiction writers.
Chiang’s stories act as the seed for a crystal of an idea, such that the most interesting developments occur not on the page but within your own reflections, days later, beneath a eucalyptus tree.
My favorites from this collection are the eponymous “Exhalation”, “Anxiety Is the Dizziness of Freedom”, and “Omphalos.”</p>
<h2 id="klara-and-the-sun">Klara and the Sun</h2>
<p><strong>Review:</strong> <a href="https://www.notion.so/jacobkimmel/Klara-and-the-Sun-b2b13a4d0cba4204825dbc31adee890e">Klara and the Sun</a></p>
<p>I love all of Ishiguro’s work, and <em>Klara and the Sun</em> is no exception.
In his trademark empathetic science fiction style, Ishiguro imagines a near-future world where artificial general intelligence (AGI) has been achieved and serves at least in part to remedy the emotional ails of humans in that fractured world.
The setting is somehow visceral and believable because of how little is revealed in direct exposition.
We glimpse the world only in the shadows it casts upon the characters, one of whom may be the first AGI protagonist in popular literary fiction.</p>
<h2 id="seeing-like-a-state">Seeing Like a State</h2>
<p><strong>Minimum-viable-summary:</strong> <a href="https://www.notion.so/jacobkimmel/Seeing-like-a-State-cda01ab06f3f49d5957cdf1e81accc85">Seeing Like a State</a></p>
<p>An admission: I’ve had James C. Scott’s <em>Seeing Like a State</em> on my reading list for <strong>years</strong> based on the overwhelming number of times it’s been recommended to me.
I finally got around to reading, and all of my friends were right!</p>
<p><em>Seeing Like a State</em> dissects how the perceptions of large organizations (here, namely nation-states) are lossy representations of the real world and how these flawed perceptions can come to dictate the nature of reality.
There’s an old adage that a truly accurate map of a kingdom would be the exact same size and scale as a kingdom itself, therefore rendering it unusable.
Scott builds from this point and highlights in several distinct examples that large organizations <em>require</em> approximations, compressions of the real state of their circumstances to make useful operational decisions.
In this frame, the <em>legibility</em> of different aspects of the real world – how easy it is for the larger organization to notice, accurately measure, and persistently record a given fact – becomes a central determinant of whether that quality is subject to optimization, taxation, exploitation, or investment.
Many actions of large organizations can then be viewed as an attempt to render legible many of the tacit aspects of the world, and those very attempts to record and assess the state of reality have actually shaped our modern world quite profoundly, from our names to the shape of our domiciles.</p>
<p>Internally, I approximate the central lesson of <em>Seeing Like a State</em> as “Heisenberg’s principle for society” – by the very act of measuring a community, a culture, or an organization, you shape it in both subtle and dramatic ways.</p>
<h2 id="time-love-memory">Time, Love, Memory</h2>
<p><a href="https://www.greenapplebooks.com/book/9780679763901"><strong>Green Apple Books</strong></a></p>
<p>Early molecular biology explained the mechanistic basis for macroscopic phenotypes like cell growth, metabolism, and gross morphological traits.
Alas, the complexities of animal behavior – even in flies, to say nothing of humans! – remained out of reach for the earliest pioneers of the discipline.
Late in his career, after building a successful program as a phage geneticist, Seymour Benzer pivoted his laboratory to focus on explaining the molecular basis of animal behavior.</p>
<p>This goal was audacious, but critically important!
Behavior, personality, emotion – notions of time, love, and memory – remained perhaps the last bastions of vitalism, the last remnants of a belief that perhaps human life cannot be explained using the same principles of physics and chemistry that govern the rest of the known universe.
Benzer’s lab began their investigations by leaning into their skill as engineers, building novel apparatuses to measure behavioral traits in genetically-tractable fruit flies.
Through a series of ingenious screens, they proceeded to uncover the genetic-determinants that allows flies to tell night from day, to learn from experience, and to find mates.
While flies are far from humans in a phylogenetic sense, these results were nonetheless powerful examples that the basic principles of molecular biology could explain even the most complex features of life.</p>
<p>Jonathan Weiner recounts the story of these discoveries in beautiful prose and helps imbue each with the personality of the investigator responsible.</p>
<h2 id="honorable-mentions">Honorable Mentions</h2>
<p><em>Crashed</em> by Adam Tooze <a href="https://www.notion.so/jacobkimmel/Crashed-cf8f3ac053b74528a449f6747e707c23">(Review)</a> – Tooze provides a definitive account of the Great Financial Crisis at a level of technical sophistication that is rarely achieved even within the disipline of economics, to say nothing of financial history. <em>Crashed</em> is just shy of making it onto my “Best Books” list because the subject matter is challenging to ingest as a linear narrative. This is not a fault of Tooze, and I’m a huge fan of <a href="https://adamtooze.substack.com">his other work</a>. Rather, the GFC is such a technically complex subject that it cries out for hypertext, mouse-over reminders of key events, interactive tables, charts, and graphs, rather than a 700+ page continuous description. Tooze does a remarkable job at condensing this information given the presentation constraints of a traditional book, but nonetheless, I found myself grasping for understanding of events off-screen and cross comparisons between different time periods in the chronology, preventing an immersive reading experience.</p>
<p><em>Hard Landing</em> by Thomas Petzinger <a href="https://www.amazon.com/Hard-Landing-Contest-Profits-Airlines/dp/0812928350/ref=asc_df_0812928350/?tag=hyprod-20&linkCode=df0&hvadid=312025907421&hvpos=&hvnetw=g&hvrand=4840200429673352252&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9031948&hvtargid=pla-330750653987&psc=1">(Link)</a> – <em>Hard Landing</em> is ostensibly the tale of America’s commercial aviation industry, but the description doesn’t quite do justice to the book. Rather, it’s a story that captures the rise and fall of corporate cultures under different external conditions during the transition from a heavily-regulated to free-market industry. Petzinger in particular has a talent for capturing the colorful characters of the industry’s early days. This makes for great fun as a reader and highlights the impact just a few operators can have on large organizations under the right circumstances. Recommended for fans of <em>Business Adventures</em> by John Brooks or <em>Liar’s Poker</em> by Michael Lewis.</p>
<!-- ## Honorary Blogs & Audioblogs
As in most years, some of the most interesting ideas I encountered in 2021 came not from books, but blog posts and podcasts.
A few posts I particularly enjoyed:
1. [Nintil](https://nintil.com/) -- Sometimes I think José Ricon's blog is secretely ghost-written by a generative model trained to maximize the Jacob-interest-level reward function. Then I remember that José's arguments often stimulate my interests in the first place -- I'm not that clever! -- and realize he must indeed be real. A few great posts this year that shaped my thinking are: [New Models for Funding and Organizing Science](https://nintil.com/new-science-models), [US Science leadership, the rise of China, and French mathematics](https://nintil.com/us-science-leadership), and [Wildfires in California](https://nintil.com/wildfires-california).
2. [Idea Machines](https://ideamachinespodcast.com) -- [Ben Reinhardt](https://benjaminreinhardt.com/about/) hosts a great podcast exploring mechanisms of science and technology development. I think of it as a "meta-technology" conversation series. I only discovered it this year, but it's been running for a while. I particularly enjoyed episodes with Adam Marblestone, Eli Dourado, and Anna Goldstsein.
3. [Seemay Chou on Arcadia](https://medium.com/@seemaychou/why-i-am-building-arcadia-6582f3dfe4a0) -- Seemay Chou is building an exciting new form of research & development organization at Arcadia Science. I found her personal account of the motivations behind this ambitious endeavour inspiring.
4. -->In 2020, I learned the most from reading historical accounts of scientific progress and funding, particularly in my field of biotechnology. For 2021, I set a goal to cover a broader swath of the history of biomedical research paired with some longer-form non-fiction in business and economics. As always, I also kept up a steady intake of science fiction.Learning representations of life2021-12-06T00:00:00+00:002021-12-06T00:00:00+00:00http://jck.bio/learning-representations-of-life<!--
Modeling the boundaries of life: Pauling used physical models of atomic structure to “hard code” known physical parameters into a hypothesis testing regime. His models were a form of physical computer — the dimensions of the objects matched realistic values, so that by simply testing of a configuration fit in 3D space, he was able to determine if a particular structure was consistent with known chemistry. As a simple example, all the bond lengths obeyed known values and double bond structure properly forced a fixed rotation angle between two bonded atoms. Encoding the “flatness” of a double bond enabled pauling to come up with the proper model for an alpha helix, vesting a rival group led by Max Perutz that incorrectly proposed several models where double bonds were able to rotate freely on their axes. Watson and crick adopted a similar strategy for their construction of DNA models. In fact, one of their first model proposals that was dismissed by R Franklin was effectively a “software” failure — they failed to account for the necessity of water near highly polar phosphate groups, which they originally proposed would be on the internal side of a DNA helix.
ML: We might imagine ML models as simply the natural evolution of Paulings simple physical models. Much of biology is best modeled as a complex system, difficult to reconstruct from a small set of rules. This makes it much harder to build useful Pauling style first principles models — the number of constants and their relationships explodes combinatorially as the complexity of the system increases. ML models offer and end run around this difficult. Given sufficient empirical data, we can learn a degenerate model of a system, if not a system that matches the rules used by biology. From these empirical models, we can rapidly test hypotheses that might otherwise be laborious to evaluate.
As simple examples: combinatorial drug screening prediction, molecular docking predictions, protein folding, DNA sequence mutations, protein sequence mutations, gene regulatory network perturbations, lineage commitment perturbation predictions (PRESCIENT). ML models are a tool just like any other in biology, closer to classical theoretical biology than many practitioners or opponents of these methods realize.
Golden era of molecular biology
* The life sciences live in the shadows of molecular biology's giants.
* Everything from our cognitive toolkit to the physical methods we employ and our definitions of success emerge from this era.
* As a disipline, we have largely adopted that classical molecular biologist's view that living systems like physical systems can largely be reduced to singular functions of individual parts.
* We define success as the assignment of a specific molecule to a function -- "mechanism," in biology largely means a molecule that can be shown as necessary and sufficient for a phenomenon.
* We run individual expeirments guided by a single hypothesis to break systems into their components so that we may name them and assign a characture of their role.
* These methods have proved incredibly powerful, revealing to us the molecular basis of heredity, the physical mechanisms of cellular replication, the basis of many diseases, and empowering us to re-write the code of life at-will.
* The logic is not infalliable though -- we often come to oversimplified conclusions and fail to appreciate the broader rules of biological systems revealed by our individual experiments.
* Arjun Raj beautifully summarizes some of these errors in his infamous cartoon of a biologist investigating the molecular biology of airplanes: Cite Arjun's cartoon
History lesson
* The life sciences did not always proceed on so narrow an intellectual path.
* Early in the history of molecular biology itself, theory, first principles reasoning, and *empirical modeling* played key roles in
Enter ML
* Machine learning has demonstrated remarkable advances in biology in the past 10 years
* Models have enabled researchers to ask biological questions at a scale that never before been possible
* A degree of tension has emerged between the ML and biological communities. These discussions aren't often public, but by traveling in these communities you gain a sense of the perspectives. Experimental biologists feel that the results of ML models are over-hyped and quickly move to defend the essential role of experimentation. ML practicioners lament the slow, one-off, irrepreducable nature of biological science, and wish that the experimental community could embrace the ML practicioners as biology's new leaders.
history lesson
* This tension reminds me of the clash-of-cultures that occured when Max Delbruck led a wave of physicists into biology following the second world war.
* Famously, Delbruck transitioned into biology after working with Lisa Meitner and Otto Hahn on nuclear physics in the 1930's. He expressed a series of concrete questions and hypotheses about the nature of living systems from a physicts perspective, made famous by Edwin Schroedinger in his pamphlet "What is life?".
*
-->
<p><em>I’m frequently asked how I think machine learning tools will change our approach to molecular and cell biology. This post is in part my answer and in part a reflection on Horace Freeland Judson’s history of early molecular biology – <a href="https://jacobkimmel.notion.site/The-Eighth-Day-of-Creation-787948ef203141a5a21be1620fcfee31">The Eighth Day of Creation.</a></em></p>
<!-- Introduction -->
<p>Machine learning approaches are now an important component of the life scientist’s toolkit.
From just a cursory review of the evidence, it’s clear that ML tools have enabled us to solve once intractable problems like genetic variant effect prediction<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, protein folding<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and unknown perturbation inference<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.
As this new class of models enters more and more branches of life science, a natural tension has arisen between the empirical mode of inquiry enabled by ML and the traditional, analytical and heuristic approach of molecular biology.
This tension is visible in the back-and-forth discourse over the role of ML in biology, with ML practitioners sometimes overstating the capabilities that models provide, and experimental biologists emphasizing the failure modes of ML models while often overlooking their strengths.</p>
<p>Reflecting on the history of molecular biology, it strikes me that the recent rise of ML tools is more of a return to form than a dramatic divergence from biological traditions that some discourse implies.</p>
<p>Molecular biology emerged from the convergence of physics and classical genetics, birthing a discipline that modeled complex biological phenomena from first principles where possible, and experimentally tested reductionist hypotheses where analytical exploration failed.
Over time, our questions began to veer into the realm of complex systems that are less amenable to analytical modeling, and molecular biology became more and more of an experimental science.</p>
<p>Machine learning tools are only now enabling us to regain the model-driven mode of inquiry we lost during that inflection of complexity.
Framed in the proper historical context, the ongoing convergence of computational and life sciences is a reprise of biology’s foundational epistemic tools, rather than the fall-from-grace too often proclaimed within our discipline.</p>
<h1 id="physicists--toy-computers">Physicists & toy computers</h1>
<blockquote>
<p>Do your own homework. To truly use first principles, don’t rely on experts or previous work. Approach new problems with the mindset of a novice – Richard Feynman</p>
</blockquote>
<p>When Linus Pauling began working to resolve the three-dimensional structures of the peptides, he built physical models of the proposed atomic configurations.
Most young biology students have seen photos of Pauling beside his models, but their significance is rarely conveyed properly.</p>
<p><img src="http://scarc.library.oregonstate.edu/coll/pauling/catalogue/09/1954i.38-600w.jpg" width="400" /></p>
<p>Pauling’s models were not merely a visualization tool to help him build intuitions for the molecular configurations of peptides.
Rather, his models were precisely machined <strong>analog computers</strong> that allowed him to empirically evaluate hypotheses at high speed.
The dimensions of the model components – bond lengths and angles – matched experimentally determined constants, so that by simply testing if a configuration fit in 3D space, he was able to determine if a particular structure was consistent with known chemistry.</p>
<p>These models “hard coded” known experimental data into a hypothesis testing framework, allowing Pauling to explore hypothesis space while implicitly obeying not only each individual experimental data point, but the emergent properties of their interactions.
Famously, encoding the steric hindrance – i.e. “flatness” – of a double bond into his model enabled Pauling to discover the proper structure for the <a href="https://en.wikipedia.org/wiki/Alpha_helix">alpha-helix</a>, while Max Perutz’s rival group incorrectly proposed alternative structures because their model hardware failed to account for this rule.</p>
<p>Following Pauling’s lead, Watson and Crick’s models of DNA structure adopted the same empirical hypothesis testing strategy.
It’s usually omitted from textbooks that Watson and Crick proposed multiple alternative structures before settling on the double-helix.
In their first such proposal, Rosalind Franklin highlighted something akin to a software error – the modelers had failed to encode a chemical rule about the balance of charges along the sugar backbone of DNA and proposed an impossible structure as a result.</p>
<p>Their discovery of the base pairing relationships emerged directly from empirical exploration with their physical model.
Watson was originally convinced that bases should form homotypic pairs – A to A, T to T, etc. – across the two strands.
Only when they built the model and found that the resulting “bulges” were incompatible with chemical rules did Watson and Crick realize that heterotypic pairs – our well known friends A to T, C to G – not only worked structurally, but confirmed Edwin Chargaff’s experimental ratios<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>
<p float="middle">
<img src="https://www.sciencehistory.org/sites/default/files/styles/rte_full_width/public/watson-crick-dna-model.jpg?itok=Qa7645Jc" height="250" />
<img src="https://www.sciencehistory.org/sites/default/files/styles/rte_full_width/public/historical_profile/rosalind-franklin.jpg?itok=sEqCmXUm" height="250" />
</p>
<p>These essential foundations of molecular biology were laid by empirical exploration of evidence based models, but they’re rarely found in our modern practice.
Rather, we largely develop individual hypotheses based on intuitions and heuristics, then test those hypotheses directly in cumbersome experimental systems.</p>
<p><em>Where did the models go?</em></p>
<h1 id="emergent-complexity-in-the-golden-era">Emergent complexity in The Golden Era</h1>
<p>The modern life sciences live in the shadow of The Golden Era of molecular biology.
The Golden Era’s beginning is perhaps demarcated by Schroedinger’s publication of Max Delbrück’s questions and hypotheses on the nature of living systems in a lecture and pamphlet entitled <a href="https://en.wikipedia.org/wiki/What_Is_Life%3F"><em>What is Life?</em></a>.
The end is less clearly defined, but I’ll argue that the latter bookend might be set by the contemporaneous development of <a href="https://en.wikipedia.org/wiki/Recombinant_DNA">recombinant DNA</a> technology by Boyer & Cohen in California <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup> [1972] and <a href="https://en.wikipedia.org/wiki/Sanger_sequencing">DNA sequencing technology</a> by Fredrick Sanger in the United Kingdom [1977].</p>
<p>In Francis Crick’s words<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>, The Golden Era was</p>
<blockquote>
<p>concerned with the very large, long-chain biological molecules – the nucleic acids and proteins and their synthesis. Biologically, this means genes and their replication and expression, genes and the gene products.</p>
</blockquote>
<p>Building on the classical biology of genetics, Golden Era biologists investigated biological questions through a reductionist framework.
The inductive bias guiding most experiments was that high-level biological phenomena – heredity, differentiation, development, cell division – could be explained by the action of a relatively small number of molecules.
From this inductive bias, the gold standard for “mechanism” in the life sciences was defined as a molecule that is necessary and sufficient to cause a biological phenomenon<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>
<p>Though molecular biology emerged from a model building past, the processes under investigation during the Golden Era were often too complex to model quantitatively with the tools of the day.
While Pauling could build a useful, analog computer from first principles to interrogate structural hypotheses, most questions involving more than a single molecular species eluded this form of analytical attack.</p>
<p>The search to discover how genes are turned on and off in a cell offers a compact example of this complexity.
Following the revelation of DNA structure and the DNA basis of heredity, Fraçois Jacob and Jacques Monod formulated a hypothesis that the levels of enzymes in individual cells were regulated by how much messenger RNA was produced from corresponding genes.
Interrogating a hypothesis of this complexity was intractable through simple analog computers of the Pauling style.
How would one even begin to ask which molecular species governed transcription, which DNA sequences conferred regulatory activity, and which products were produced in response to which stimuli using 1960’s methods?</p>
<p>Rather, Jacob and Monod turned to the classical toolkit of molecular biology.
They proposed a hypothesis that specific DNA elements controlled the expression of genes in response to stimuli, then directly tested that hypothesis using a complex experimental system<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.
Modeling the underlying biology was so intractable that it was simply more efficient to test hypotheses in the real system than to explore in a simplified version.</p>
<p><strong>The questions posed by molecular biology outpaced the measurement and computational technologies in complexity, beginning a long winter in the era of empirical models.</strong></p>
<h1 id="learning-the-rules-of-life">Learning the rules of life</h1>
<blockquote>
<p>John von Neumann […] asked, How does one state a theory of pattern vision? And he said, maybe the thing is that you can’t give a theory of pattern vision – but all you can do is to give a prescription for making a device that will see patterns!</p>
<p>In other words, where a science like physics works in terms of laws, or a science like molecular biology, to now, is stated in terms of mechanisms, maybe now what one has to begin to think of is algorithms. Recipes. Procedures. – Sydney Brenner<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></p>
</blockquote>
<p>Biology’s first models followed from the physical science tradition, building “up” from first principles to predict the behavior of more complex systems.
As molecular biology entered The Golden Era, the systems of interest crossed a threshold of complexity, no longer amenable to this form of bottom up modeling.
This intractability to analysis is the hallmark feature of <a href="https://en.wikipedia.org/wiki/Complex_system"><strong>complex systems</strong></a>.</p>
<p>There’s no general solution to modeling complex systems, but the computational sciences offer a tractable alternative to the analytical approach.
Rather than beginning with a set of rules and attempting to predict emergent behavior, we can observe the emergent properties of a complex system and build models that capture the underlying rules.
We might imagine this as a “top-down” approach to modeling, in contrast to the “bottom-up” approach of the physical tradition.</p>
<p>Whereas analytical modelers working on early structures had only a few experimental measurements to contend with – often just a few X-ray diffraction images – cellular and tissue systems within a complex organism might require orders of magnitude more data to properly describe.
If we want to model how transcriptional regulators define cell types, we might need gene expression profiles of many distinct cell types in an organism.
If we want to predict how a given genetic change might effect the morphology of a cell, we might similarly require images of cells with diverse genetic backgrounds.
It’s simply not tractable for human-scale heuristics to reason through this sort large scale data and extract useful, quantitative rules of the system.</p>
<p>Machine learning tools address just this problem.
By completing some task using these large datasets, we can distill relevant rules of the system into a compact collection of model parameters.
These tasks might involve supervision, like predicting the genotype from our cell images above, or be purely unsupervised, like training an autoencoder to compress and decompress the gene expression profiles we mentioned.
Given a trained model, machine learning tools then offer us a host of natural approaches for both <a href="https://en.wikipedia.org/wiki/Statistical_inference">inference</a> and prediction.</p>
<p>Most of the groundbreaking work at the intersection of ML and biology has taken advantage of a category of methods known as <a href="https://arxiv.org/abs/1206.5538">representation learning</a>.
Representation learning methods fit parameters to transform raw measurements like images or expression profiles into a new, numeric represenatation that captures useful properties of the inputs.
By exploring these representations and model behaviors, we can extract insights similar to those gained from testing atomic configurations with a carefully machined structure.
This is a fairly abstract statement, but it becomes clear with a few concrete examples.</p>
<p>If we wish to train a model to predict cell types from gene expression profiles, a representation learning approach to the problem might first reduce the raw expression profiles into a compressed code – say, a 16-dimensional vector of numbers on the real line – that is nonetheless sufficient to distinguish one cell type from another<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.
One beautiful aspect of this approach is that the learned representations often reveal relationships between the observations that aren’t explicitly called for during training.
For instance, our cell type classifier might naturally learn to group similar cell types near one another, revealing something akin to their lineage structure.</p>
<p>At first blush, learned representations are quite intellectually distant from Pauling’s first principles models of molecular structure.
The implementation details and means of specifying the rules couldn’t be more distinct!
Yet, the tasks these two classes of models enable are actually quite similar.</p>
<p>If we continue to explore the learned representation of our cell type classifier, we can use it to test hypotheses in much the same way Pauling, Crick, and countless others tested structural hypotheses with mechanical tools.</p>
<p>We might hypothesize that the gene expression program controlled by <em>TF X</em> helps define the identity of cell type A.
To investigate this hypothesis, we might synthetically increase or decrease the expression of <em>TF X</em> and its target genes in real cell profiles, then ask how this perturbation changes our model’s prediction.
If we find that the cell type prediction score for cell type A is correlated with <em>TF X’s</em> program more so than say, a background set of other TF programs, we might consider it a suggestive piece of evidence for our hypothesis.</p>
<p>This hypothesis exploration strategy is not so dissimilar from Pauling’s first principles models.
Both have similar failure modes – if the rules encoded within the model are wrong, then the model might lend support to erroneous hypotheses.</p>
<p>In the analytical models of old, these failures most often arose from erroneous experimental data.
ML models can fall prey to erroneous experimental evidence too, but also to spurrious relationships within the data.
A learned representation might assume that an observed relationship between variables always holds true, implicitly connecting the variables in a causal graph, when in reality the variables just happened to correlate in the observations.</p>
<p>Regardless of how incorrect rules find their way into either type of model, the remedy is the same.
Models are tools for hypothesis exploration and generation, and real-world experiments are still required for validation.</p>
<h1 id="old-is-new">Old is new</h1>
<p>Despite the implementation details, ML models are then not so distinct from the analog models of old.
They enable researchers to rapidly test biological hypotheses to see if they obey the “rules” of the underlying system.
The main distinction is how those rules are encoded.</p>
<p>In the classical, analytical models, rules emerged from individual experiments, were pruned heuristically by researchers, and then a larger working model was built-up from their aggregate.
By contrast, machine learning models derive less explicit rules that are consistent with a large amount of experimental data.
In both cases, these rules are not necessary correct, and researchers need to be wary of leading themselves astray based on faulty models.
You need to be no more and no less cautious, no matter which modeling tool you choose to wield.</p>
<p>This distinction of how rules are derived is then rather small in the grand scheme.
Incorporating machine learning models to answer a biological question is not a departure from the intellectual tradition that transformed biology from an observational practice to an explanatory and engineering disipline.
Rather, applications of ML to biology are a return to the formal approaches that allowed molecular biology to blossom from the fields that came before it.</p>
<!-- # Endless forms most beautiful
> From so simple a beginning endless forms most beautiful and most wonderful have been, and are being evolved -- Charles Darwin -->
<!--
grails attained:
1 - classical genetics - models of DNA regulatory sequence
2 - classical biochemistry - models of protein structure from sequence and conservation
3 - classical development - models of cell type from expression and chromatin accessibility (scNym, scArches, multiVI)
grails unfinished:
1 - classical cell biology - integrative models of cellular structure. Integrated Cell reference.
2 - classical development - models of gene program interactions, predicting perturbations. Combinatorial perturbation autoencoder.
3 - physiology - models of how organ functions influence another. If cell type A in the liver changes in a particular direction, how will this effect the adipose tissue?
4 - biochemistry -- protein protein interactions. allosteric interactions with small molecules. evolution on molecular docking.
5 - virology - influence of mutations on viral fitness, virulence, pathogenicity
6 - immunology - sequence to antigen specificity conversion, predict sequences for target antigens
-->
<!-- Given this analog between molecular biology's origin and the present confluence with ML, we might naturally wonder: What questions can this new class of models help us answer? What forms will these models take?
Analytical models helped reveal the structures of biological molecules, the kinetics of enzymatic reactions[^11], and the stochastic basis of genetic mutations[^12].
Already, machine learing models have enabled us to push beyond the limits of analytical approaches in some of the same biological domains.
A comprehensive discussion of the ML / biology intersection would span hundreds of pages[^13], but considering just a few examples provides many of the necessary intuitions to predict where the field is moving.
## Decoding the genome
How does a change in DNA sequence of a genome influence the phenotype of the organism?
This classic question sits at the heart of modern molecular biology and remains the primary challenge in tasks ranging from comparative biology to human genetics.
Using analytical approaches, Gamow, Crick, Brenner, and the other members of the ["RNA Tie Club,"](https://www.wikiwand.com/en/RNA_Tie_Club) broke the cypher that maps DNA to protein sequences[^14].
While the cypher explains a core piece of the Central Dogma, the analytical approach can only explain the effect of a small number of mutations[^15].
<!-- **Current advances** -->
<!--
Many researchers have developed ML models to close the gap between this limited set of analytical predictions and our empirical observation that mutations outside coding sequences matter a great deal.
These models can predict the effects of DNA sequence changes anywhere in a genome, most notably employing convolutional neural networks and multi-headed attention architectures.
As one illustrative example, [Basset](https://github.com/davek44/Basset) is a convolutional neural network developed by my colleague [David R. Kelley](http://www.davidrkelley.com/info) that predicts many functional genomics experimental results from DNA sequence alone.
Said differently, the model learns to translate DNA sequence into likely functional outcomes, including gene expression activity, chromatin accessibility, and transcription factor binding.
Approaches descended from Basset like [Basenji](https://github.com/calico/basenji), [BPNet](https://github.com/kundajelab/bpnet), and [Enformer](https://github.com/deepmind/deepmind-research/tree/master/enformer) build on a similar set of ideas but incorporate more sophisticated neural networks to achieve superior performance.
Using these tools, a researcher can exhaustively search the space of hypotheses about the effect of DNA mutations on functional genomic features.
By feeding the model mutated versions of the real, reference genome, researchers can obtain the predicted effect of each individual mutation.
These "*in silico* mutagenesis" experiments might be analogized to Pauling flipping the components of his molecular model into different orientations, querying the rules encoded in the model to see if they support a hypothesis.
With such remarkable results, are these models the "solution" to genetics?
Far from it.
DNA sequence models have achieved remarkable performance, but there are still fundamental questions that remain unsolved.
As a holy grail of molecular genetics, we might wish to translate the sequence of a genome into a list of phenotypes describing the organism that will develop, how it will live, and how might perish.
After all -- all of this information is reflected in the sequence alone.
Our current generation of DNA sequence models are far from this lofty goal.
We're currently limited to predicting only the first-order effects of DNA sequence changes, rather than considering the higher-order effects that might result.
For instance, if we *in silico* mutagenize the promoter of a transcription factor, current models might correctly predict a decrease in transcription factor gene expression, but can't extend that logic to predict that accessibility at the transcription factor's binding motifs might decreased.
This first-order prediction limitation is a common feature of many current ML models applied to biological problems, and represents a key challenge for future work modeling complex biological systems filled with feedback loops.
It's important to keep these limitations front of mind as we consider the very material benefits these modeling approaches provide, lest we fool ourselves into underestimating the complexity of biological problems. -->
<!--
**Unsolved problems**
While remarkable in their capability, existing regulatory sequence analysis models have important limitations:
* Current models predict only first-order impacts -- ablating the promoter sequence of a transcription factor doesn't change accessibility predictions for it's target genes in the same genome
* The effect of large scale gene sequence changes, like large insertion or deletions, or chromosomal rearrangements is difficult to model
* Most models map local sequence features to local sequence phenotypes. Few are capable of predicting higher-order phenotypes from large sets of sequences. e.g. We're not yet at the point of predicting the influence of a genetic variant on animal development. -->
<!-- ## Biochemistry
How does a simple, linear amino acid sequence give rise to a complex, three-dimensional structure?
How do these structures confer the essential biological functions that dictate cell geometry, reproduction, and metabolism?
Analagous to molecular biology's obsession with information, these are a few of the eminent questions in biochemistry.
**Current advances**
Multiple groups have now constructed machine learning models that can fairly accurately predict the structure of a protein from an amino acid sequence[^11].
This is a truly remarkable feat, solving one of biology's widely recognized Grand Challenges.
By
* Molecular docking review
**Unsolved problems**
* Mutation prediction
* Small molecule design by discrimination -->
<!-- ## Development
How does a metazoan cell decide which genes to express and which cell type to become?
This is another classic question, the first glimmers of an answer emerging in [Hilde Mangold's experiments](https://www.wikiwand.com/en/Spemann-Mangold_organizer) that showed diffusable signals confered anatomical identity.
Through an analytical approach, embryologists proposed models for how these signals might propogate and interact to determine the myriad identities in a complex organism, but they too were outmatched by complexity at the cellular level[^16].
While analytical models can explain how distinct cell types might receive unique signals at some point in development, the molecular mechanisms that confer unique cell identities based on those signals are too complex to unravel from first principles.
Here too, researchers have taken advantage of machine learning approaches t
* -->
<!-- ## Immunology
## Virology -->
<h1 id="footnotes">Footnotes</h1>
<!--
[11]: [Michaelis–Menten kinetics](https://en.wikipedia.org/wiki/Michaelis–Menten_kinetics)
[12]: Salvador Luria and the Max Delbrück performed a classic experiment -- ["the Luria-Delbrück experiment"](https://en.wikipedia.org/wiki/Luria–Delbrück_experiment) -- that showed bacterial mutations conferring phage resistance were acquired spontaneously, rather than induced by the prescence of phage. Their interpretation of the results relied upon a complex analytical model for how many resistant colonies were expected across a number of culture plates if resistance mutations were stochastic.
[13]: See this excellent, open-access review from an amazing collection of leaders in the field -- [Opportunities and obstacles for deep learning in biology](https://greenelab.github.io/deep-review/)
[14]: Crick noted many years after the solution was found that "cypher," is a more appropriate term than "code," since a cypher describes a mapping at the level of individual letters, while a code describes a mapping at the level of words.
[15]: Non-sense mutations in gene coding sequences can be predicted to abolish the function of the gene product based purely on first principles. The effects of almost all other mutations are very difficult to interpret using purely analytical tools.
[16]: For instance, researcher's proposed the [French Flag model](https://www.wikiwand.com/en/French_flag_model) of morphogen interaction to explain how just two signaling molecules can specify diverse cellular fates.
[16]: [DeepMind's AlphaFold](https://www.nature.com/articles/s41586-021-03819-2), [David Baker lab's three-track model](https://science.sciencemag.org/content/early/2021/07/14/science.abj8754?adobe_mc=MCMID%3D55247908165515510124239564654459857138%7CMCORGID%3D242B6472541199F70A4C98A6%2540AdobeOrg%7CTS%3D1638513014&_ga=2.32296894.1688072684.1638513014-313706132.1636862856) -->
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Researchers have built a series of ML models to interpret the effects of DNA sequence changes, most notably employing convolutional neural networks and multi-headed attention architectures. As one illustrative example, <a href="https://github.com/davek44/basenji">Basenji</a> is a convolutional neural network developed by my colleague <a href="http://www.davidrkelley.com/info">David R. Kelley</a> that predicts many functional genomics experimental results from DNA sequence alone. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Both <a href="https://www.nature.com/articles/s41586-021-03819-2">DeepMind’s AlphaFold</a> and <a href="https://science.sciencemag.org/content/early/2021/07/14/science.abj8754?adobe_mc=MCMID%3D55247908165515510124239564654459857138%7CMCORGID%3D242B6472541199F70A4C98A6%2540AdobeOrg%7CTS%3D1638513014&_ga=2.32296894.1688072684.1638513014-313706132.1636862856">David Baker lab’s three-track model</a> can predict the 3D-structure of a protein from an amino acid sequence well enough that the community considers the problem “solved.” <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>If we’ve observed the effect of perturbation <em>X</em> in cell type <em>A</em>, can we predict the effect in cell type <em>B</em>? If we’ve seen the effect of perturbations <em>X</em> and <em>Y</em> alone, can we predict the effect of <em>X + Y</em> together? A flurry of work in this field has emerged in the past couple years, summarized wonderfully by Yuge Ji in a <a href="https://www.cell.com/cell-systems/pdf/S2405-4712(21)00202-7.pdf">recent review.</a> As a few quick examples, <a href="https://github.com/theislab/scgen">conditional variational autoencoders</a> can be used to predict known perturbations in new cell types, and <a href="https://www.science.org/doi/10.1126/science.aax4438">recommender systems can be adapted to predict perturbation interactions.</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Watson and Crick both knew Chargaff, but didn’t appreciate the relevance of his experimentally measured nucleotide ratios until guided toward that structure by their modeling work. Chargaff famously did not hold Watson and Crick in high regard. Upon learning of Watson and Crick’s structure, he quipped – “That such giant shadows are cast by such [small men] only shows how late in the day it has become.” <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>The history of recombinant DNA technology is beautifully described in <a href="https://jacobkimmel.notion.site/Invisible-Frontiers-The-Race-to-Synthesize-a-Human-Gene-9dc341fcc1c24723a38e9545c98417d9"><em>Invisible Frontiers</em> by Stephen Hall.</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Judson, Horace Freeland. The Eighth Day of Creation: Makers of the Revolution in Biology (p. 309). <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>As a single example, Oswald Avery’s classic experiment demonstrating that DNA was the genetic macromolecule proved both points. He demonstrated DNA was necessary to transform bacterial cells, and that DNA alone was sufficient. An elegant, clean-and-shut case. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>The classical experiment revealed that mutations in the <em>lac</em> operon could control <em>expression</em> of the beta-galactosidase genes, connecting DNA sequence to regulatory activity for the first time. <a href="https://life.ibs.re.kr/courses/landmark/PaJaMo1959.pdf">“The Genetic Control and Cytoplasmic Expression of Inducibility in the Synthesis of beta-galactosidase by E. Coli”.</a> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>Judson, Horace Freeland. The Eighth Day of Creation: Makers of the Revolution in Biology (p. 334). <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>This is just one of many problems at the ML : biology interface, but <a href="http://jck.bio/scnym/">it’s one I happen to have an affinity for.</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Weekly scientific article highlights2021-10-18T00:00:00+00:002021-10-18T00:00:00+00:00http://jck.bio/oct_2021_paper_highlights<p>I tend to write a set of short paper summaries every week to help crystalize my interpretation of the scientific literature.
Others have found these useful in the past, so I’ll start sharing them at a more regular cadence here to highlight work I found interesting.</p>
<p>Papers will appear without an particular ordering.</p>
<h3 id="confronting-false-discoveries-in-single-cell-differential-expression"><a href="https://www.nature.com/articles/s41467-021-25960-2">Confronting false discoveries in single-cell differential expression</a></h3>
<p><a href="https://www.nature.com/articles/s41467-021-25960-2">Link</a></p>
<p>This is a sober take on some Simpson’s Paradox flavored issues that can arise in single cell differential expression. The authors begin by assembling a compendium of 18 datasets where both bulk and single cell data were collected from control and perturbed cells of a single cell type. They then proceed to evaluate many different DE methods to recover the bulk DE results from single cells, treating bulk DE as ground truth.</p>
<p>The essential conclusion is that most common single cell DE methods don’t account for biological replicates. Instead, they consider cells as replicates.
In the event that biological replicates exhibit lots of variation, this can confound DE results and lead to lots of false discoveries.
Surprisingly, the authors report that a simple pseudo-bulk procedure – i.e. generating pseudobulk profiles for each biological replicate – followed by classic bulk RNA-seq methods [DESeq2, edgeR] is superior to any single cell DE method for recovering DE genes from bulk samples.</p>
<p>There’s one caveat the authors don’t touch on here, which is that their “ground truth” is a bit circular.
One can imagine genes where expression is driven by a small number of cells in the tissue, such that variability is better assessed by single cell methods than bulk RNA-seq.
By treating bulk data as ground truth, these cases where single cell DE methods might perform better are masked.</p>
<p>Overall, I think this work highlights the importance of explicitly considering biological replicate information.
Despite the authors critique of mixed-model GLMs at the end, I think that’s likely the way forward.</p>
<p>I hope to one day get back around to building out a Bayesian mixed GLM implementation to enable this sort of analysis with denoised single cell data.</p>
<h3 id="vega-is-an-interpretable-generative-model-for-inferring-biological-network-activity-in-single-cell-transcriptomics"><a href="https://www.nature.com/articles/s41467-021-26017-0">VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics</a></h3>
<p><a href="https://www.nature.com/articles/s41467-021-26017-0">Link</a></p>
<p>Here, the authors present a modified single cell VAE that incorporates a sparse set of weights in the decoder reflecting prior knowledge. The basic idea is similar to <a href="https://pubmed.ncbi.nlm.nih.gov/31249421/">PLIER</a> in that the authors mask weights of a one-layer linear decoder so that each latent variable can contribute only to a pre-specified set of genes.
They generate these gene masks based on several different databases [MSigDB, Reactome, TF GRNs] and show that their models can recover enrichment of immune cell signaling gene sets upon stimulation and cell type specific enrichment of TF GRNs.</p>
<p>There aren’t any benchmarking excercises comparing the VEGA approach to simple gene set scoring baselines [e.g. gene set enrichment analysis on differentially expressed genes].
It’s therefore unclear if the VEGA approach is superior to more traditional bioinformatics tools as implemented here.
Nonetheless, this is a clever idea and it’s interesting to see that even straightforward regularization approaches can yield more interpretable latent spaces.</p>
<p>Congratulations to the authors on a clean, well-executed algorithm!</p>
<h3 id="simba-single-cell-embedding-along-with-features"><a href="https://www.biorxiv.org/content/10.1101/2021.10.17.464750v1">SIMBA: SIngle-cell eMBedding Along with features</a></h3>
<p><a href="https://www.biorxiv.org/content/10.1101/2021.10.17.464750v1">Link</a></p>
<p>This is an interesting new paper applying knowledge graph embedding methods to single cell genomics problems. They authors only benchmark a subset of the problems the method addresses, so it’s unclear exactly where a user should employ this approach, but the idea is quite distinct from the common approaches for several single cell genomics analysis tasks.</p>
<p>The authors formulate the problem of interpreting single cell experiments as a knowledge graph interpretation task. Given several biological entities [cells, genes, peaks, TF motifs], we want to learn relationships between them. They propose using knowledge graph embeddings to learn a representation where related entities co-embed.
In practice, they treat cells and genes [or cells, peaks, TF motifs etc] as nodes in the graph, and draw edges between them when a given gene is expressed in a cell, or a given peak is open.
They then apply a standard contrastive loss that maximizes the similarity of nodes that share edges in the embedding relative to a random set of edges that are constructed as negative examples.
Framing the problem as a knowledge graph embedding is reminiscent of my colleague Han Yuan’s work to embed transcription factor binding motifs with <a href="https://www.nature.com/articles/s41592-019-0511-y">BindSpace.</a></p>
<p>Formally, the method contructs a graph $G = (V, E)$ defined by vertices $v_i \in V$ and edges $e = (v_i, v_j) \in E$.
The graph construction differs depending on the analysis task, but in the simplest single cell RNA-seq case, the vertices are both cells <strong>and</strong> genes, while edges are drawn between cells and the genes they express.
Each of these edges is given a coarse grained weight based on the expression level of the gene in each cell.</p>
<p>The method then fits an embedding $\theta$ by minimizing a contrastive loss with stochastic gradient descent.
The loss is defined to maximize the “score” of true edges in the graph relative to simulated edges generated between semi-random pairs of nodes.
The score $s_e$ is defined as the simple dot product of two nodes in the embedding $s_e = \theta_{v_i} \cdot \theta_{v_j}$.
Intuitively, an edge will get a high score if the nodes are close together in the embedding, and a low score if they are far apart.</p>
<p>This brings us to our loss:</p>
<p>$L = - \log \frac{\exp(s)}{\sum_{s’ \in N} \exp(s’)} + \lVert \theta \rVert_2^2$</p>
<p>where the right hand term is a simple $\ell_2$ regularization penalty on the embedding $\theta$.</p>
<p>The real trick to fitting the embedding is contained in the denominator of the loss term $\sum_{s’ \in N} \exp(s’)$.
Here, $N$ is a set of scores for “negative” edges that are created by randomizing the real set of edges.
The basic idea is that the authors start with real edges, then replace either the source or target node with a random node of the same type.
For instance, we might start with a real edge between the node <code class="language-plaintext highlighter-rouge">B cell</code> and the node <code class="language-plaintext highlighter-rouge">Pax5</code> and randomize it to create a “negative” edge <code class="language-plaintext highlighter-rouge">(B cell, Myod1)</code>.
Our embedding then learns to give higher scores – and thus more similar embedding coordinates – to nodes that are associated in the knowledge graph relative to a random background set.
The authors note that a few different tricks are required to make this work [such as mixing true randomization with stratified sampling].</p>
<p>Once fit, they have an embedding of not only cells but also genes and peaks in the same space. Kinda weird to think about at first.
The authors find that they can identify marker genes based on coembedding with different cell types, find TF motifs that are enriched in different cell populations, and correct for batch effects across datasets.
For cell cluster recovery, cell marker discovery, and batch effect correction, the authors report superior performance to some existing baseline approaches. I found this convincing evidence that the embeddings recover more than just pretty nearest neighbor graphs.</p>
<p>For future work, I’m particularly interested in the application of knowledge graph embeddings to recovering gene regulatory networks [e.g. $\mathrm{TF} \rightarrow \mathrm{Target\ Gene}$ graphs).
The authors highlight some qualitative results in this work, but it will be exciting to see comparisons to baseline approaches [covariance, sequence motifs, combination methods like SCENIC, scBasset et. al.] in the future.</p>I tend to write a set of short paper summaries every week to help crystalize my interpretation of the scientific literature. Others have found these useful in the past, so I’ll start sharing them at a more regular cadence here to highlight work I found interesting.Rejuvenation By Reprogramming2021-05-26T00:00:00+00:002021-05-26T00:00:00+00:00http://jck.bio/rejuvenation-by-reprogramming<ul>
<li><strong>Paper:</strong> <a href="https://doi.org/10.1016/j.cels.2022.05.002">https://doi.org/10.1016/j.cels.2022.05.002</a></li>
<li><strong>Paper PDF</strong>: <a href="http://jck.bio/assets/../../../assets/files/2022_roux_cell_systems.pdf">PDF Download</a></li>
<li><strong>Supplement PDF</strong>: <a href="http://jck.bio/assets/../../../assets/files/2022_roux_cell_systems_supp.pdf">PDF Download</a></li>
<li><strong>Research Website:</strong> <a href="https://reprog.research.calicolabs.com">reprog.research.calicolabs.com</a></li>
</ul>
<p>Mammalian aging dramatically remodels gene expression in diverse cell identities, as revealed by aging cell cartography studies (<a href="https://mca.research.calicolabs.com/">Calico Murine Aging Cell Atlas</a>, <a href="https://tabula-muris-senis.ds.czbiohub.org/"><em>Tabula Muris Senis</em></a>).
Germline ontogeny is the only process known to reverse features of aging in individual cells, such that adult cells can give rise to young animals (<a href="https://pubmed.ncbi.nlm.nih.gov/13903027/">Gurdon 1963</a>).
Reprogramming cell identity to a pluripotent state the canonical pluripotency transcription factors (Yamanaka factors <em>Sox2, Oct4, Klf4, Myc</em>) has also been reported to erase many features of aging (<a href="https://pubmed.ncbi.nlm.nih.gov/26456686/">Mertens et. al. 2015</a>).</p>
<p>Recent reports have suggested that even short, transient activation of the Yamanaka factors is sufficient to reverse some aspects of cellular aging (<a href="https://pubmed.ncbi.nlm.nih.gov/27984723/">Ocampo et. al. 2016</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/32210226/">Sarkar et. al. 2020</a>, <a href="https://pubmed.ncbi.nlm.nih.gov/33268865/">Lu et. al. 2020</a>, <a href="https://www.biorxiv.org/content/10.1101/2021.01.15.426786v1">Gill et. al.</a>).
These exciting results prompt several questions: What features of aging are reversed? Does partial reprogramming exert similar effects across different cell types? Which aspects of the pluripotency program are required for rejuvenation?</p>
<p>Here, we interrogated these questions by mapping trajectories of partial reprogramming in multiple cell types using single cell genomics.
We further measured the effect of partial reprogramming with all possible combinations of the Yamanaka factor set using pooled screening approaches.
Inspired by limb regeneration in amphibians, we also explored whether partial multipotent reprogramming could restore youthful expression in myogenic cells.</p>
<h2 id="partial-reprogramming-restores-youthful-expression-and-suppresses-cell-identity">Partial reprogramming restores youthful expression and suppresses cell identity</h2>
<p>We performed partial reprogramming with SOKM in young and aged adipogenic and mesenchymal stem cells.
By measuring gene expression across single cells, we captured cells in diverse states across the trajectory of partial reprogramming.</p>
<p><img src="/assets/images/reprog/web_yf_poly_trajectories.png" width="600" /></p>
<p>Single cell expression profiles in both adipogenic cells and MSCs revealed a continuous trajectory of cell states induced by partial reprogramming.
We also profiled control cells that were not reprogrammed, allowing us to compare the effects of aging and reprogramming in a common measurement space.</p>
<p>We first wondered if partial reprogramming reversed some features of aging.
To investigate, we used maximum mean discrepancy (MMD) comparisons between young and aged cells before and after treatment, considering features across the transcriptome.
Remarkably, we found that adipogenic cells were more similar to young controls after treatment, with youthful expression levels restored in thousands of genes.
In MSCs, we found that fibrotic gene sets and an aging signature derived from bulk RNA-seq were similarly reduced.</p>
<p><img src="/assets/images/reprog/web_poly_youthful_expr.png" width="600" /></p>
<h3 id="somatic-cell-identities-are-transiently-suppressed-by-partial-reprogramming">Somatic cell identities are transiently suppressed by partial reprogramming</h3>
<p>Reprogramming induced unique cell states, unseen in control conditions in both cell types.
These unique states suggested to us that reprogramming might be suppressing somatic cell identity programs, despite some prior reports to the contrary.
We performed pseudotime analysis to map each cell to a continuous coordinate system spanning the length of the reprogramming trajectories we observed.</p>
<p><img src="/assets/images/reprog/web_yf_pst_velo.png" width="600" /></p>
<p>We found that somatic cell identity programs were suppressed and pluripotency identity programs were activated in the most reprogrammed cells along these trajectories.
In particular, we observed activation of the <em>Nanog</em> transcription factor, previously reported to be a gate-keeper to the induction of full pluripotency.</p>
<p>Pluripotent cells are characteristically neoplastic, forming teratomas <em>in vivo</em>.
Our observation that <em>Nanog</em> is activated in a subset of partially reprogrammed cells suggests that even transient activation of pluripotency programs poses a neoplastic risk.
Given that we observed only a small <em>Nanog+</em> cell population, it seems likely that previous reports using bulk measurements were not able to detect this rare cell state.</p>
<p>We next wondered if partially reprogrammed cells would re-acquire their original somatic identities, as suggested by MEF to iPSC reprogramming systems (<a href="https://pubmed.ncbi.nlm.nih.gov/20621051/">Samavarchi-Tehrani et. al. 2010</a>).<br />
We turned to RNA velocity analysis to infer changes in cell state and found that most reprogrammed cells in both populations were re-acquiring their original somatic identities.</p>
<h2 id="pluripotency-submodules-are-sufficient-to-restore-youthful-expression">Pluripotency submodules are sufficient to restore youthful expression</h2>
<p>Are all four Yamanaka factors required to restore youthful expression? Are there any sufficient subsets?</p>
<p><img src="/assets/images/reprog/web_yf_screen.png" width="600" /></p>
<p>We next wondered if alternative reprogramming strategies could also restore youthful expression.
The neoplastic risk posed by oncogenes in the Yamanaka Factor set (<em>Klf4, Myc</em>) motivates a search for alternative approaches.
We also wondered if the suppression of cell identity we observed was intimately connected to rejuvenation, or if these two phenomena could be decoupled.</p>
<p>To investigate these questions, we developed a screening system that allowed us to perform partial reprogramming interventions in a pooled format with single cell RNA-seq as a read-out.
Our approach was inspired by the CellTag lineage-tracing system (<a href="https://pubmed.ncbi.nlm.nih.gov/30518857/">Biddy et. al. 2018</a>), taking advantage of expressed barcodes in the 3’ UTR of a constituitive reporter.
We used this system to test partial reprogramming in young and aged MSCs with all possible combinations of the Yamanaka factors.</p>
<p><img src="/assets/images/reprog/web_yf_screen_results.png" width="600" /></p>
<p>We found that the transcriptional effects of partial reprogramming scaled with the number of unique factors delivered, consistent with known biology for the Yamanaka factors.
To determine which combinations had unique effects, we trained a cell identity classification model (<a href="https://scnym.research.calicolabs.com">scNym</a>) to discriminate different combinations based on transcriptional profiles.
We found that effects from combinations of three factors were highly similar to the full Yamanaka factor set, suggesting no single factor is required rejuvenation.</p>
<h3 id="rejuvenation-and-identity-suppression-are-not-closely-entangled">Rejuvenation and identity suppression are not closely entangled</h3>
<p>We also scored the expression of an aging gene signature and derived mesenchymal cell identity program scores using a cell classifier trained on a mouse cell atlas (<a href="https://tabula-muris.ds.czbiohub.org/"><em>Tabula Muris</em></a>).
We found that almost all combinations significantly reduced the expression of the aging signature, and all significantly suppressed mesenchymal identity.
However, the degree of rejuvenation and identity suppression were not significantly correlated, suggesting these effects can be decoupled.
The results of our screen suggest that the activation of the full pluripotency program is not required to suppress some features of aging.</p>
<h2 id="multipotent-reprogramming-interventions-restore-myogenic-gene-expression">Multipotent reprogramming interventions restore myogenic gene expression</h2>
<p>Can partial multipotent reprogramming reverse features of aging?</p>
<p><img src="/assets/images/reprog/web_yf_myo.png" width="512" /></p>
<p>Urodele amphibians have the remarkable ability to regenerate limbs through an endogeneous dedifferentiation process.
One key player in this process is the mesodermal transcription factor <em>Msx1</em>.
Previous work has shown that <em>Msx1</em> is sufficient to dedifferentiate synctial myotubes back into proliferating mononuclear progenitor cells, without inducing pluripotency.</p>
<p>We wondered if transient activation of this multipotency factor might also reverse features of aging in myogenic cells, similar to the Yamanaka factors (<a href="https://pubmed.ncbi.nlm.nih.gov/32210226/">Sarkar et. al. 2020</a>).
We performed a pulse/chase of <em>Msx1</em> followed by single cell RNA-seq in aged myogenic cells, similar to our other experiments.
It has been reported that myogenic differentiation is impaired in aged myogenic cells, and here we found that transient <em>Msx1</em> treatment improved myogenic gene expression in two independent experiments.
This result suggests that transient activation of progenitor factors outside the core pluripotency program may also restore youthful gene expression, similar to the canonical Yamanaka factors.</p>Paper: https://doi.org/10.1016/j.cels.2022.05.002 Paper PDF: PDF Download Supplement PDF: PDF Download Research Website: reprog.research.calicolabs.com2020 Best Books2020-12-19T00:00:00+00:002020-12-19T00:00:00+00:00http://jck.bio/best_books_2020<p>As a kid, I used to dream about a room filled with books where time was dilated.
You could go into this room and read for as long as you wanted, then wander back out to find that hardly a moment had passed outside.
Equipped with this retreat, the stacks at the library wouldn’t feel so daunting.</p>
<p>2020 has been anything but a bastion, but time has seemed to pass outside the normal course of events.
Part of my head is still wandering through early March as I walk about my neighborhood, as if we’ll all wake up tomorrow and make summer plans.
This strange progression of days has allowed me to indulge in my childhood dream in some small way, spending more time with books than opportunity costs would usually merit.</p>
<p>A few favorites from my reading these last few months are outlined below.</p>
<h2 id="invisible-frontiers">Invisible Frontiers</h2>
<p><strong>Review:</strong> <a href="https://www.notion.so/Invisible-Frontiers-The-Race-to-Synthesize-a-Human-Gene-9dc341fcc1c24723a38e9545c98417d9">Invisible Frontiers: The Race to Synthesize a Human Gene</a></p>
<p>Molecular biology has shaped the modern world, but the industrial and medical nature of the ensuing advances has led to a low salience for these technologies in the culture. <em>Invisible Frontiers</em> is old and out of print, but it’s one of the few stories to capture the wonder felt by many life scientists when they first encounter our newfound powers to manipulate the code of life. Following the story of the first molecular cloning experiments to the first marketed products from Genentech, Hall provides a fly-on-the-wall perspective to some of the foundational moments in the modern life sciences. I can’t recommend it highly enough.</p>
<h2 id="dancing-in-the-glory-of-monsters">Dancing in the Glory of Monsters</h2>
<p><strong>Review</strong>: <a href="https://www.notion.so/Dancing-in-the-Glory-of-Monsters-The-Collapse-of-the-Congo-and-the-Great-War-of-Africa-5dff49daa21a4c2fb1baf335a6e2a904">Dancing in the Glory of Monsters: The Collapse of the Congo and the Great War of Africa</a></p>
<p>Dancing in the Glory of Monsters is an amazing mental model for the frailty of political and sociological systems.</p>
<p>I found myself thinking about this book more than any other this year.</p>
<h2 id="how-asia-works">How Asia Works</h2>
<p><strong>Review:</strong> <a href="https://www.notion.so/How-Asia-Works-18c89d9dd9b74814b8eaa1935f3b2db8">How Asia Works</a></p>
<p>How did some east Asian economies dance to the frontier of technology after World War II, while others stagnated? Studwell dissects this question with lucidity and narrative in a remarkably readable work of developmental economics.</p>
<p>I want to read 100 books like this.</p>
<h2 id="inventing-the-nih">Inventing the NIH</h2>
<p><strong>Review</strong>: <a href="https://jkimmel.net/inventing_the_nih">Inventing the NIH</a></p>
<p>The NIH is one of the most important institutions in the history of biomedicine. How and why was it created? Harden provides one of the few detailed accounts of the institute’s genesis.</p>
<h2 id="cadillac-desert">Cadillac Desert</h2>
<p>Cadillac Desert 🏜 is the story of water in the American West. It has a municipal espionage agency, federal appropriations for airplanes classified as dams, empirical evidence for the inertia of policy, the origins of aerospace in the PNW, & so much more</p>
<p><a href="https://www.greenapplebooks.com/book/9781553656777">https://www.greenapplebooks.com/book/9781553656777</a></p>
<h2 id="hoover-an-extraordinary-life">Hoover: An Extraordinary Life</h2>
<p><strong>Review</strong>: <a href="https://www.notion.so/Hoover-An-Extraordinary-Life-in-Extraordinary-Times-9ecb67f0561541cdb8c15bceaf0cd57a">Hoover: An Extraordinary Life in Extraordinary Times</a></p>
<p>Hoover is infinitely more interesting that the typical one-dimensional character portrayed in US history classes. He somehow managed to be present for a non-trivial portion of world events in the early twentieth century, such that his personal story allows for a human recount of a rapidly changing world.</p>
<h2 id="her-2-the-making-of-herceptin">Her-2: The Making of Herceptin</h2>
<p><strong>Review</strong>: <a href="http://jkimmel.net/her2/">Her-2 - The Making of Herceptin</a></p>
<p>Biotech has improved the lives of countless families, but there are few accessible books on how medicines are made.</p>
<p>Bazell’s Her-2 is an exception. He captures the development of Herceptin & offers a template for understanding drug development.</p>As a kid, I used to dream about a room filled with books where time was dilated. You could go into this room and read for as long as you wanted, then wander back out to find that hardly a moment had passed outside. Equipped with this retreat, the stacks at the library wouldn’t feel so daunting.Her-2 – The Making of Herceptin2020-10-03T00:00:00+00:002020-10-03T00:00:00+00:00http://jck.bio/her2<h2 id="how-are-new-medicines-invented">How are new medicine’s invented?</h2>
<p>There are surprisingly few books that tell the story of world changing medicines. Biotechnology has improved the lives of countless patients in the past half-century, but you can’t appreciate that fact scanning the spines at your favorite bookstore. There are books about the early development of most popular websites (e.g. <a href="https://www.notion.so/Facebook-1b9b1a5a0d2e4be1a3e37f1858487c8d">Facebook</a>, <a href="https://www.amazon.com/Hatching-Twitter-Story-Friendship-Betrayal/dp/1591847087/ref=sr_1_1?dchild=1&keywords=hatching+twitter&qid=1601758277&s=books&sr=1-1">Hatching Twitter</a>, <a href="https://www.amazon.com/No-Filter-Inside-Story-Instagram/dp/1982126809/ref=sr_1_1?dchild=1&keywords=no+filter&qid=1601758264&s=books&sr=1-1">No Filter</a>, <a href="https://www.amazon.com/Plex-Google-Thinks-Works-Shapes/dp/1416596585">In The Plex</a>, <a href="https://www.amazon.com/Everything-Store-Jeff-Bezos-Amazon/dp/0552167835/ref=tmm_pap_swatch_0?_encoding=UTF8&qid=1601758314&sr=1-3">The Everything Store</a>, etc. etc.), and yet medicines that have <a href="https://en.wikipedia.org/wiki/Sofosbuvir">cured intolerable diseases outright</a> receive comparatively less attention in our popular canon <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>It’s worth celebrating then the few stories of drug development that have been committed to text. <a href="https://www.notion.so/The-Billion-Dollar-Molecule-d745084540a040c6bc1b7965068cfd2a">The Billion Dollar Molecule</a> and <a href="https://www.notion.so/The-Antidote-a011bfdd8dd045d08e8b5a53a1237cd4">The Antidote</a> by Barry Werth have long been my go-to example for how to tell these stories well. I’m pleased to add Robert Bazell’s <em>Her-2: The Making of Herceptin</em> to the list.</p>
<hr />
<p><em>Her-2</em> recounts the development of Herceptin by Genentech and their academic partners. Herceptin was among the first “targeted” cancer therapies that function by specifically inhibiting cancer cell growth, rather than inhibiting the growth of all cells in the body like traditional chemotherapeutics. It’s difficult to understate the impact Herceptin has had on patient lives and the oncology drug development sphere writ large. Whereas it was once commonly accepted that “targeted antibody therapies don’t work for cancer,” monoclonal antibodies and targeted small molecules have now been developed for several cancer indications against a diverse set of targets <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. The success of Herceptin was a catalyzing event for this change in focus for the industry.</p>
<h2 id="how-does-herceptin-work">How does Herceptin work?</h2>
<p>The mechanism-of-action that allows Herceptin to inhibit cancer growth is fairly easy to write on a napkin. Cells in the body proliferate in response to growth factor signals — often hormones or proteins circulating in the blood or permeating tissues. These factors are essential to allow for growth of the body during development. Genentech’s first marketed product was ironically human growth hormone <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>
<p>In some breast and ovarian cancer cells, a receptor for epidermal growth factor encoded by the <em>HER2</em> gene is expressed at much higher levels than in normal cells (overexpressed in the language of molecular biology). These cells get extra growth signals as a result of this abundant receptor, leading them to proliferate aberrantly. Herceptin is an antibody — a special protein produced by B cells of the immune system to bind to specific targets — that binds specifically to the <em>HER2</em> receptor. By blocking these extra growth signals, Herceptin can limit the growth of some cancers, shrinking tumors and extending patient’s lives.</p>
<p><img src="http://jck.bio/assets/images/her2/herceptin_cartoon.png" alt="Her2 mechanism cartoon" /></p>
<p>Bazzel does a remarkably good job of conveying this mechanism to a lay audience. Too often in biotechnology reporting, technical details are either overbearing or irresponsibly elided. Bazzel manages to strike the proper balance to leave a reader educated, without being bored.</p>
<h2 id="how-did-we-find-this-target">How did we find this target?</h2>
<p>The beginning of every drug development story is the identification of <strong>target</strong>. Target is a special word in drug development, denoting the molecular process you need to modify to treat a disease. For the majority of drugs, targets are specific proteins, and the modification is an inhibition of that protein’s activity.</p>
<p>Bazzel begins his story at this crucial stage — too often left out of historical accounts, even in the biotechnology industry press. The story is filled with classical scientific serendipity. The ortholog of <em>HER2</em> was originally discovered as an oncogene (a gene that causes cancer when mutated in some way) in chickens and named <em>erb-B</em>.</p>
<p><a href="https://biology.mit.edu/profile/robert-a-weinberg/">Robert Weinberg’s team</a> at MIT later discovered a rat ortholog through transfection experiments. His team induced neuro/glioblastomas in rat embryos by injecting a mutagen during development, then transferred DNA from resulting tumors into a set of non-cancerous cells to find genes that might be inducing cancerous growth. Across four separate tumors, they homed in on an oncogene that converted otherwise normal cells to cancerous growth. Based on a hunch, the team performed hybridization (base-pair level binding assays) to <em>erb-B</em> and observed that the rat and chicken genes had some level of shared sequence, known as homology. Perhaps the chicken oncogene had a mammalian ortholog! The original paper describing these experiments is worth a read <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</p>
<p>Axel Ulrich had been the first person to clone the epidermal growth factor (EGF) itself. Famous for his role in cloning human insulin (see <a href="https://www.notion.so/Invisible-Frontiers-The-Race-to-Synthesize-a-Human-Gene-9dc341fcc1c24723a38e9545c98417d9">Invisible Frontiers: The Race to Synthesize a Human Gene</a>), Ulrich was one of the first scientists at Genentech. Mike Watterfield had a hunch that <em>erb-B</em> was identical to the human EGF receptor gene. Since <em>erb-B</em> was known to cause cancer in chickens, this suggested that EGF receptors in humans might do the same thing!</p>
<p>Watterfield called Ulrich, a known master cloner, for help cloning the human EGF receptor gene to investigate this hypothesis. The collaboration was a profound success, resulting in the first clear connection between growth factor signaling and human cancers. The paper is also worth a read <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>.</p>
<p>Ulrich used the sequence of the human EGF receptor to search for similar genes, and he pulled out <em>HER2</em>. He was able to show that <em>HER2</em> was homologous to the <em>neu</em> gene named by Weinberg’s team. By sheer coincidence, Ulrich bumped into Dennis Slamon in the Denver Airport, an oncologist with an extensive collection of human tumors. The two struck up an agreement to search for Ulrich’s <em>HER2</em> in Slamon’s samples to see if <em>HER2</em> was driving human cancers. They struck upon samples with 30-fold upregulation of <em>HER2</em> relative to normal cells — a clear hit <sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup>.</p>
<h3 id="validating-the-target">Validating the target</h3>
<p>These overexpression experiments sure suggested <em>HER2</em> might play a role in cancers, but how can we know for sure? In a set of follow-up experiments, Ulrich and Slamon showed that <em>HER2</em> overexpression was sufficient to induce cancers, and that blocking <em>HER2</em> with an antibody in mice could shrink tumors. In drug development, these critical experiments are known as <strong>target validation</strong> — drawing the causal graph between nodes connected previously only by correlational edges.</p>
<h2 id="from-target-to-therapy">From target to therapy</h2>
<p>At this stage, Ulrich’s role at Genentech seems to present a natural path toward translating this discovery into a real medicine. Unfortunately, Genentech had recently made some ill-advised investments in using recombinant interferons as cancer treatments, and wanted to exit oncology altogether after the high profile failure of those programs. Within the company, the <em>HER2</em> program struggled for resources. Ulrich eventually quit out of frustration.</p>
<p>Despite positive data using a monoclonal antibody to treat human tumors transplanted into mice, there was strong skepticism among senior Genentech management that antibodies would ever be successful for cancer treatment. The thinking at the time was that any protein targeted on a cancer cell would simply be downregulated — the cancer cells would mutate to avoid expressing the targeted protein. While not far-fetched, this thinking failed to appreciate the phenomenon of <strong>oncogene addiction,</strong> where tumor growth is dependent on a particular mutated gene. Some <em>HER2</em> driven cancers can’t mutate away from their <em>HER2+</em> state without severely reducing growth — exactly the reaction you want.</p>
<p>David Botstein and Art Levinson were able to see the promise in <em>HER2</em> therapy when others were skeptical. Through their leadership, laboratory research continued on <em>HER2</em> therapies, and additional executives were eventually convinced of the therapeutic potential for monoclonal antibodies in oncology. Their foresight was prescient <sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</p>
<p>In order for the mouse antibody to be used in humans, Genentech needed to remove as much of the mouse protein sequence as possible and replace it with human counterparts. The mouse sequence is recognized as foreign by the human body, and attacked by the immune system. The process of swapping out sections of the mouse antibody gene for human counterparts is known as “humanizing” an antibody, and is now standard practice. In the early days of Herceptin though, this was unproven territory and a risky bet. To his credit, Paul Carter at Genentech accomplished this task in only 10 months (!).</p>
<h2 id="how-do-we-know-if-the-therapy-works">How do we know if the therapy works?</h2>
<p>After the anti-HER2 antibody (anti-TargetName is a convention for antibody naming) was humanized, it needed to be tested in actual patients. Drug development proceeds through three stages in the USFDA system:</p>
<ol>
<li>Phase I trials establish the safety of a drug, but don’t test efficacy. Outside oncology, these trials are in healthy volunteers.</li>
<li>Phase II trials test escalating doses in patients. Increasingly, Phase II trials are used for early efficacy read-outs to dubious effect.</li>
<li>Phase III trials are the gold-standard test of effectiveness — large cohorts of patients receive the treatment or an alternative in a randomized controlled trial.</li>
</ol>
<p>Phase I and II studies are conducted a bit differently in oncology, where drugs are often too toxic to be tested in healthy volunteers. Instead, cancer patients with no alternative treatment options receive experimental therapies as a last resort treatment. Genentech launched their trials in breast cancer patients based on the unmet medical need and high <em>HER2</em> prevalence.</p>
<p>In the Phase I and II studies, some conducted by Ulrich’s early collaborator Slamon, a handful of patients saw remarkable responses to the drug. These patients had cancers that were recalcitrant to traditional chemotherapy, but nonetheless some with high <em>HER2</em> expression saw drastic reductions in tumor size and became cancer free for long periods of time. Unlike traditional chemotherapies, Herceptin was largely free of notable side-effects when used alone, so these successful treatments could continue for months to years on end.</p>
<p>Despite these promising early results, the Phase III trials proved incredibly difficult. The Phase III scale is easily 10-100X the scale of Phase II in terms of patient numbers, with a commensurate increase in logistical burden and cost. For Genentech’s Phase III, they originally planned a placebo control arm of the trial that discouraged many patients from participating. Why sit through hours-long infusions of “antibody” if it might just be saline?</p>
<p>The trial struggled to enroll the necessary number of patients for almost a year. In that time, Art Levinson took over as CEO, and Genentech leadership took the risky-but-necessary step of dropping the placebo control arm to increase patient enrollment in the study. After this expensive near-death experience, the trial enrolled on schedule and eventually treated over 450 women. The trials unexpectedly finished <strong>early</strong>, despite the delays. This was due to the unfortunate discovery that <em>HER2+</em> breast cancers have a more rapid progression than the general breast cancer population, so that the effects of the treatment were visible earlier than expected.</p>
<p>Those effects were overwhelmingly positive. Even in an arm of the trial that specifically treated patients with the worst, least treatable cancers, more than 10% of patients saw their tumors shrink by >50%. Another 30% of these patients saw their aggressive cancers half their growth, providing them with an average of 9 additional months with their loved ones.</p>
<p>In the larger trial in less serious patients, results were similarly positive. 49% of women saw their tumors decrease by >50% in size, while only 39% of women saw the same effect on standard chemotherapy alone. Added to a then-new microtubule inhibiting agent Taxol, Herceptin increased response rates from 16% to 40%.</p>
<h2 id="lessons">Lessons</h2>
<p>Herceptin ignited the era of targeted cancer therapy, and encountered strong headwinds along the way.</p>
<p>A few take-aways:</p>
<ul>
<li>The results of previous <code class="language-plaintext highlighter-rouge">Modality::Indication</code> combinations shouldn’t be overly generalized. The real relationship of interest is the <code class="language-plaintext highlighter-rouge">Modality::Target::Indication</code>. Previous antibody based cancer treatments failed because they used the wrong target, not because all antibodies are ineffective for treating all cancer. A similar lesson is currently being relearned in the gene therapy field.</li>
<li>Internal champions are essential for the progression of drug development programs, even when early results are positive. The hypothesis space for therapeutics is so large that even candidates with positive pre-clinical data don’t always receive investment. Without David Botstein and Art Levinson, Herceptin may have been canceled before reaching Phase III trials.</li>
<li>More targeted therapies address more targeted populations. Herceptin is so effective and tolerated because of it’s highly specific mechanism. That same specificity means only a subset (~10-30%) of patients see a benefit.</li>
</ul>
<p>Other drug development programs likely have complementary yet independent lessons to be extracted <sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>. I wish we had more of these stories to learn from.</p>
<p>Do you know of others captured in similar form? Send me an email — <a href="mailto:jacob@jkimmel.net">jacob@jkimmel.net</a> or a Tweet <a href="http://twitter.com/jacobkimmel">@jacobkimmel</a>.</p>
<h2 id="footnotes">Footnotes</h2>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Why is this true? Is the technical background required for good science writing too high? Is there no market for these stories? <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><a href="https://www.nature.com/articles/nrd3186">https://www.nature.com/articles/nrd3186</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Recombinant human insulin (trade name <a href="https://www.humulin.com/insulin-options">Humulin</a>) was the first drug developed at Genentech, but was marketed in partnership with Eli Lilly. See <a href="https://www.notion.so/Genentech-The-Beginnings-of-Biotech-662a707433224d7b8030a22528bad89e">Genentech: The Beginnings of Biotech</a>, <a href="https://www.notion.so/Invisible-Frontiers-The-Race-to-Synthesize-a-Human-Gene-9dc341fcc1c24723a38e9545c98417d9">Invisible Frontiers: The Race to Synthesize a Human Gene</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p><a href="https://pubmed.ncbi.nlm.nih.gov/6095109/">https://pubmed.ncbi.nlm.nih.gov/6095109/</a> <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p><a href="https://pubmed.ncbi.nlm.nih.gov/6328312/">https://pubmed.ncbi.nlm.nih.gov/6328312/</a> <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p><a href="https://pubmed.ncbi.nlm.nih.gov/3798106/">https://pubmed.ncbi.nlm.nih.gov/3798106/</a> <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Disclaimer: I work at a <a href="http://calicolabs.com">company</a> founded by David and Art, and I greatly respect them both. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>This is a joke for NIH grant nerds. A classic line in an NIH grant is that the objectives are “complementary but independent,” because you need to accomplish the insane task of planning 3+ years of research where no action depends on the results of previous actions. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>How are new medicine’s invented?Open Questions2020-09-01T00:00:00+00:002020-09-01T00:00:00+00:00http://jck.bio/lists/open_questions<p>Inspired by <a href="https://patrickcollison.com/questions">Patrick Collision’s list of interesting questions</a>, I’ve collected a similar list of open questions that interest me and books I’d love to read that may not exist.
These two lists are closely intertwined, as I often find that the most satisfying answers to an open question could fill a book.</p>
<h1 id="open-questions">Open Questions</h1>
<h2 id="why-cant-we-accelerate-animal-development-what-sets-the-clock">Why can’t we accelerate animal development? What sets the clock?</h2>
<p>There are many mutants in biological model organisms that experience delayed development (Mouse, <em>Mus musculus</em>: ; Fruit fly, <em>Drosophila melanogaster</em>: ; Eutelic nematode <em>C. elegans</em>: ).
However, there are few reports of perturbations that accelerate animal development.
Faster development would obviously be desirable from a fitness perspective, so we might suspect that there are some good reasons that development requires a set amount of time in each organism.
We also know that body size and gestation time are associated, suggesting that there is a roughly logarithmic relationship between the total amount of animal that must be built and the total</p>
<p>What are the rate limiting steps in development that prohibit acceleration?
The first-order guess might be that cell cycle doubling times are the limiting factor, but some quick Fermi-style estimates <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> suggest that a mouse could develop much faster if only cell cycle rates were limiting.
What fundamental limits account for the differences?
A few candidate guesses: rates of diffusion for signaling molecules, rates of transcription and protein synthesis for differentiation into complex cell types (e.g. muscles need sarcomeres), rates of chromatin remodeling for cell differentiation, and rates of cell motility for the developmental migration.
In the ideal case, I’d love to have a Gant-style chart identifying the limiting rate at each stage in development.</p>
<h1 id="bibliographic-desiderata">Bibliographic Desiderata</h1>
<h2 id="a-history-of-the-national-institutes-of-health">A History of the National Institutes of Health</h2>
<p>The US National Institutes of Health (NIH, FY2019 ~38B USD) and National Science Foundation (NSF, FY2019 ~7.5B USD) are some of the world’s pre-eminent scientific funding agencies.
The funding decisions made by these institutions set the boundaries for most US scientific research and therefore play an outsized role in the rate of human technological progress.
What decisions led to the existing bureaucratic structures within each of these agencies?</p>
<p>For instance, why is the NIH organized into <a href="https://www.nih.gov/institutes-nih/list-nih-institutes-centers-offices">“Institutional Centers,”</a> focused on specific anatomical regions (NIHLB, NIAMS, NEI, NIDCR) or diseases (NCI, NIA, NIAID, NIDDK), rather than say, biological disiplines as in a university (Cell & Molecular Biology, Biochemistry, Computational Biology)?
Is this the most effective organizational structure?
What is the relative return-on-investment for intramural research at the NIH (the IRP in NIH-speak) relative to the extramural grants provided to academics?
Why does the intramural research program largely copy the organizational structure of academic labs, despite having a very different incentive and workforce structure?</p>
<p>I’d love to understand the factors that led to the existing NIH/NSF funding models and read objective assessments of their effectiveness all in one place.</p>
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>A single cell <a href="https://bionumbers.hms.harvard.edu/bionumber.aspx?s=n&v=0&id=109717">weighs ~1 ng</a> and a P1 C57Bl/6 pup weighs ~1 g. So, there are <code class="language-plaintext highlighter-rouge">1/1e-9 g = 1e9 cells</code> in a P1 pup. It would take <code class="language-plaintext highlighter-rouge">1e9 = 2**x -> x = log2(1e9) -> x = 30 divisions</code> to generate a P1 pup’s worth of cells. Mouse embryonic stem cells double every 4-5 hours. While other mouse cell types later in development divide more slowly, the embryonic stem cell division rate sets an upper bound on the maximum possible division rate for mouse cells. At this rate, it would only take <code class="language-plaintext highlighter-rouge">5 hours * 30 * 1 day/24 hours = 6.25 days</code> to build a P1 mouse! Mouse development <a href="https://embryology.med.unsw.edu.au/embryology/index.php/Mouse_Timeline_Detailed">actually takes ~21 days</a>, so the observed developmental time isn’t even close to the maximum theoretical rate. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Inspired by Patrick Collision’s list of interesting questions, I’ve collected a similar list of open questions that interest me and books I’d love to read that may not exist. These two lists are closely intertwined, as I often find that the most satisfying answers to an open question could fill a book.Inventing the NIH2020-06-21T00:00:00+00:002020-06-21T00:00:00+00:00http://jck.bio/inventing_the_nih<p><em>This post began as a book review of Inventing the NIH by Victoria Angela Harden, but grew out a bit from there.</em></p>
<h2 id="how-did-the-us-create-one-of-the-most-impactful-scientific-institutions-in-history">How did the US create one of the most impactful scientific institutions in history?</h2>
<p>The National Institutes of Health (NIH) is the world’s pre-eminent biomedical research agency. The annual <a href="https://www.niaid.nih.gov/grants-contracts/budget-appropriation-fiscal-year-2020">NIH budget ($40B+)</a> is an order of magnitude larger than peer institutions (<a href="https://www.google.com/search?client=safari&rls=en&q=CIHR+budget&ie=UTF-8&oe=UTF-8">CIHR in Canada</a>, <a href="https://www.google.com/search?client=safari&rls=en&q=UK+MRC+budget&ie=UTF-8&oe=UTF-8">MRC in the UK</a>) in nominal terms, and commensurately the NIH is responsible either directly or indirectly for a <a href="https://nexus.od.nih.gov/all/2016/03/02/nih-publication-impact-a-first-look/">plurality of the world’s impactful biomedical research each year.</a></p>
<p>One reductive but instructive data point on the impact of the NIH is the number of Nobel Prize recipients with NIH funding. NIH-funded scientists have received <a href="https://www.nih.gov/about-nih/what-we-do/nih-almanac/nobel-laureates">>10%</a> of <a href="https://www.nobelprize.org/prizes/lists/all-nobel-prizes">all Nobel prizes in history.</a> If we subset to the Nobel Prizes for <a href="https://en.wikipedia.org/wiki/Nobel_Prize_in_Physiology_or_Medicine">Physiology and Medicine</a> (110 total prizes) or <a href="https://www.google.com/search?client=safari&rls=en&q=number+of+nobel+prizes+in+chemistry&ie=UTF-8&oe=UTF-8">Chemistry</a> (111 prizes) where all but two NIH-funded scientists received their awards, NIH-funded scientists have received a shocking <strong>42%</strong> of all prizes. This is especially notable given that the NIH has only existed since 1930 and the Nobel’s began in 1901.</p>
<p>In just the 2010-2016 period, NIH funding can be traced to scientific breakthroughs that supported the development of <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5878010/">210 new drugs</a>. (It’s important to note that <a href="https://blogs.sciencemag.org/pipeline/archives/2004/09/09/how_it_really_works">NIH funded basic discovery is but one component</a> of the vexing, arduous path to drug discovery.)</p>
<p>From even a cursory glance, it’s apparent that the NIH is responsible for a non-trivial fraction of human progress in both biology and medicine. I’ve long been fascinated by the NIH as an institution: how did it come to be; how does it prioritize abstract, long-term goals; and how might we improve the funding mechanisms of the NIH to accelerate biological discovery.</p>
<p>Given that the NIH funds such a large portion of discovery in one of the most rapidly advancing scientific fields, it seems that we can learn a great deal about scientific progress by investigating the NIH’s political origins and operational decisions.</p>
<p>It strikes me that the NIH’s mandate is much more radical than most presentations of the institutions long history suggest. The NIH fundamentally takes taxpayer dollars, bequeathed by all, and uses that revenue to fund exploratory, high risk basic research. In the language of venture capital, the NIH is <a href="http://www.paulgraham.com/swan.html">black swan farming</a>, but rather than risking the funds of wealthy limited partners, the NIH invests with the public purse.</p>
<p>I believe this arrangement has led to almost unquantifiable good for humanity, but nonetheless, it’s a shocking proposition to include in a political speech.</p>
<p>Imagine the pitch:</p>
<p><em>“I would like to take tax dollars, disperse them widely on a number of individuals with interesting but inherently difficult to justify ideas, and then we’ll cross our fingers and hope for the best.”</em></p>
<p>But the pitch worked!<br />
And so did the science!</p>
<p><strong>How did this happen?</strong></p>
<h2 id="inventing-the-nih--a-review">Inventing the NIH — A Review</h2>
<p>Victoria Harding presents a step-by-step account of the NIH’s political origins in <em>Inventing the NIH</em>, with a strong focus on the role of non-governmental organizations and lobbying groups. She eloquently outlines how the NIH blossomed from much smaller beginnings into a high-growth scientific juggernaut. While insightful, Harding’s text is written for the academic historian and a bit difficult to consume for leisure. I’ve tried to extract some of the main insights below in a briefer form.</p>
<h2 id="the-marine-hospital-service-and-the-hygiene-laboratory">The Marine Hospital Service and the Hygiene Laboratory</h2>
<p>The NIH was not created anew from whole-cloth in a single legislative text. Rather, it was built upon existing institutional foundations, created for related but distinct purposes.</p>
<p>The deepest origins of the NIH connect back to the <a href="https://en.wikipedia.org/wiki/Marine_Hospital_Service">Marine Hospital Service</a>, a network of hospitals specifically created to treat ill seamen, funded by a tax on their wages. In a way, we can think of this network as a form of integrated health insurance similar to <a href="https://en.wikipedia.org/wiki/Kaiser_Permanente">Kaiser Permanente</a> in modern California. The Service was originally run within the precursor to the US Coast Guard, but was given independent management after the Civil War within the Department of Treasury. This change in management led to the development of distinct class of public health civil servants within Hospital Service.</p>
<p>Crises and external circumstances began to expand the Hospital Service’s initial mission. Beginning with management of quarantines for incoming ships, Congress and the executive branch began asking the Hospital Service to manage and investigate various other public health problems.</p>
<p>It seems like the logic here was roughly: Who has the personnel to deal with problem X? That weird marine worker insurance program? Sure, give it to them.</p>
<p><a href="https://en.wikipedia.org/wiki/Germ_theory_of_disease">Germ theory</a> developed in the late nineteenth century, representing one of the great conceptual advances in modern biology. As part of the growing list of demands from Congress, the Hospital Service came to employ a few students of this new doctrine, including a direct trainee of Robert Koch himself, <a href="https://en.wikipedia.org/wiki/Joseph_J._Kinyoun">Joseph Kinyoun</a>. Kinyoun was placed in charge of establishing what we would now recognize as a basic research facility, termed the <a href="https://en.wikipedia.org/wiki/National_Institutes_of_Health#History">Hygienic Laboratory</a> in keeping with the nomenclature of the time. This laboratory was fairly small by modern standards (< 200 employees), but it was the first time federal funding was used to support ongoing basic health research.</p>
<h2 id="public-health-reformers-push-the-hospital-service-to-partner-with-some-enterprising-chemists">Public health reformers push the Hospital Service to partner with some enterprising chemists</h2>
<p>Throughout the early 20th century, a number of private organizations lobbied the US government to become more involved in public health. These groups included labor unions, life insurance companies, social workers, and philanthropic foundations. Several of their campaigns boiled down to advocating a reorganization of existing programs from a <a href="https://hbr.org/1968/11/organizational-choice-product-vs-function">divisional organization structure to a functional organization structure.</a> It’s not clear to me this was really a great idea, but it seems like the public health advocates <em>really</em> wanted the government to spend more federal dollars on health overall, and the reorganization demand was a problematic political tactic that allowed them to claim they were seeking efficiencies, while inadvertently alienating the existing civil service.</p>
<p>The Hospital Service was defensive when it came to these possible reforms, as they feared they might be subsumed then eliminated inside some larger department. However, they too wanted some reforms made. It seems they were particularly upset about their poor job security and compensation. These compensation problems stemmed from federal rules that allowed medical doctors to receive federal commissions, but not scientists. To improve their compensation, leaders of the Hospital Service were open to forming political coalitions with reformers, so long as they retained their independence and won pay increases.</p>
<p>These reform campaigns set the stage for Senator Joseph E. Ransdell of Louisiana to partner with members of the American Chemical Society seeking to establish a new research institute for the study of “physiological chemistry.”
The ACS members were interested in establishing an institute modeled on Rockefeller University (then, Rockefeller Institute), providing long-term support for basic research from a private endowment to understand the chemical basis of human disease.</p>
<p>After failing for nearly a decade to raise private funds, the ACS members were convinced by Ransdell that the US Government would be a worthy patron. Together, ACS members and Randsell collaborated to develop a proposal for a federal research institute that grew into the National Institute of Health (singular at first!).</p>
<p>In a funny anecdote, it appears Ransdell chose the name at the last minute, crossing out a previous name in the bill text and replacing it with NIH.</p>
<p>Ransdell and the chemists ran through the District of Columbia trying to gather support for their new proposal. After much effort, they received a luke-warm endorsement from the Hospital Service on the grounds that a bill for the new institute also included their desired pay increases. From personal correspondence of Hospital Service leaders, it doesn’t seem like they were all the favorable to the new institute, but really needed political help in the Senate.</p>
<p>In particular, the head of the Hygienic Laboratory viewed a new NIH-like institution as a competitor to his own existing efforts, and he believed it would be impossible to scale a research institution beyond the scope of the Hygienic Laboratory.
It strikes me that fear of hypergrowth, and a failure to imagine large scale operation are a common failure mode within otherwise productive organizations like the Hygienic Laboratory.</p>
<h2 id="how-did-they-convince-the-public">How did they convince the public?</h2>
<p>It took Ransdell and the ACS members 4 years and 2 US presidents to finally pass their NIH legislation. Harden provides an incredibly detailed account of the process. As expected, opposition to increased federal spending was the primary obstruction to the creation of the NIH, but idiosyncratic outcomes and the fickleness of individual legislative personalities also played a role in the lengthy road to acceptance.</p>
<p>The arguments put forward by Ransdell and supporters contained many familiar points.
They emphasized the efficiency of preventative treatment for disease, the necessity of basic science for developing new medicines, and they leaned on past successes of federally funded basic research like the creation of a vaccine for Rocky Mountain Fever.</p>
<p>In addition to these familiar arguments, I’ll posit that three additional factors contributed to the NIH’s success:</p>
<ol>
<li>Flexibility in the face of political reality</li>
<li>The 1928 influenza pandemic provided political momentum and a demand for action</li>
<li>Biomedical science had just entered a phase of exponential growth — successful exemplars were easy to find</li>
</ol>
<h3 id="downsizing-the-ask">Downsizing the ask</h3>
<p>NIH proponents’ requested appropriation was based on cost estimates produced by the ACS for their envisioned institute — $10M over five years (~$150M inflation adjusted to 2020). Several Senators viewed this appropriation as exorbitant at <a href="https://fred.stlouisfed.org/series/FYONET">a time when the US Government was much smaller in absolute terms than in the modern era</a> (federal net outlays in 1929: $3.1B). After back and forth with the Andrew Mellon’s Treasury Department (yes, that <a href="https://www.wikiwand.com/en/Andrew_Mellon">Andrew Mellon</a>), the initial appropriation was dramatically scaled back to $750,000.</p>
<p>While disappointed, Ransdell and supporters were willing to accept a small initial appropriation in exchange for creating the framework for federally funded biomedical research. They felt confident that future legislation would increase the allocation, and that the institution would become a valued part of American life. In these hopes, they proved prescient.</p>
<p>This willingness to scale ambition to political reality seems essential to establishing an inherently high risk endeavor like the NIH. Risk tolerance tends to increase when the downside is well bounded.</p>
<h3 id="a-desire-for-action-in-the-face-of-tragedy">A desire for action in the face of tragedy</h3>
<p>The contemporary epidemiological context no doubt also played a role. The <a href="https://ajph.aphapublications.org/doi/pdf/10.2105/AJPH.20.2.119">winter of 1928-1929 saw the deadliest influenza pandemic</a> since the pandemic of 1918. This crisis was entirely new to me, and seems to have faded from the broad public consciousness.</p>
<p>Both President Calvin Coolidge and members of Congress were more receptive to proposals for increased federal expenditures on healthcare and health research in the wake of the proximal tragedy. Coolidge himself vetoed an earlier public health reform bill, but his attitude warmed considerably following the pandemic such that he became one of the stronger political supporters of the legislation.</p>
<p>Coolidge’s reversal stands out to me as a positive example of how local context can influence long-term public policy making. We have an almost perfect counter-factual to consider what would have happened in the absence of the influenza outbreak. The bill had been before Congress already just a year prior, a similar bill had already passed and been vetoed, and yet with few if any changes, the NIH was able to win support once an acute event highlighted the importance of such an institution for long-term health of the public.</p>
<h3 id="exponential-growth-in-biomedical-capability">Exponential growth in biomedical capability</h3>
<p>A key component of Ransdell’s presentation was a set of vignettes highlighting biomedical research advances with everyday impact.</p>
<p>In one portion of the presentation, Ransdell showed short microscopy film of motile cells in a culture dish taken by <a href="http://centennial.rucares.org/index.php?page=Mammalian_Cells">scientists at Rockefeller in Albert H Ebeling’s group</a> during the hearings.
These movies are so entrancing that many scientists (<a href="https://jkimmel.net/heteromotility">myself included</a>) still work on understanding the biology on display today.
I found this aspect of the argument endearing, and wanted to dig a little deeper than Harden’s coverage.</p>
<p>I found what appears to be the exact exchange <a href="https://books.google.com/books?id=JwQ7FkdAo7AC&pg=PA20&lpg=PA20&dq=national+institutes+of+health+ransdell+film+cell+culture&source=bl&ots=nSnL8LUe3k&sig=ACfU3U3NSj9xxXUeTAIHWSjmn1PHpJf6Hw&hl=en&sa=X&ved=2ahUKEwjkweCfq5TqAhVYCTQIHcPeC3oQ6AEwCXoECAkQAQ#v=onepage&q=national%20institutes%20of%20health%20ransdell%20film%20cell%20culture&f=false">captured here</a>:</p>
<blockquote>
<p>The cause of such diseases as nephritis, arteriosclerosis, cancer [..] must be discovered. [..] This cannot be brought without great advances in the knowledge of fundamental properties of cells.
…
This is a culture which has been placed in a suitable medium, and is functioning just as it did in a small embryo. […] Now what you are going to see going on before you is a process which in the incubator under the microscope covers a period of 24 hours and you are seeing in 15 or 20 seconds.</p>
</blockquote>
<p>(I tried to find the original Ebeling film to no avail. The Wellcome recently restored <a href="https://www.youtube.com/watch?v=W3pMVTflyy4&list=UUaDJONKhVydHbOii9faPNMA">some microscopy films from the same period</a>, and I imagine the Ebeling film may have looked similar.)</p>
<p>The Rockefeller scientists explained that even though cell culture is artificial and a bit fanciful, these model systems had allowed them to develop a production system for <em><a href="https://en.wikipedia.org/wiki/Vaccinia?oldformat=true">Vaccinia</a></em> vaccines.</p>
<p>This example is almost a perfect encapsulation of modern biomedical research. The scientists began their study by asking a simple question: What do animal cells need to survive? Can we provide everything a cell needs outside the body? They followed this conclusion through a series of experiments to find the proper culture conditions that allow for <em>ex vivo</em> cell cultures. Though unanticipated at the outset of research, these culture platforms proved useful in later studies of a virulent infectious disease and the production of treatment.</p>
<p><em>Exploring a fundamental question yielded an unexpected, unpredictable practical benefit.</em></p>
<p>While this is only one such example, Ransdell’s presentation took place in the midst of long awaited biomedical advances that were making impacts on the lives of everyday Americans. Germ theory provided a framework for understanding and preventing infectious disease, to astonishing effect. <a href="https://jamanetwork.com/journals/jama/fullarticle/768249">Deaths from infectious disease</a> were <strong>cut roughly in half (!)</strong> between 1900 and 1920 (see a Figure from <a href="https://cdn.jamanetwork.com/ama/content_public/journal/jama/4590/m_joc80862f1.gif?Expires=2147483647&Signature=T2cIAfko7J-eH-QKTE7IgFXujBG7Ok1sBns7b-pi7mTHTqB9dAMZrhFoOhiK8WEJvInKgVxh8Eamo9doaFFFS2WA6acKRlHIgv2E7~~1b2ctAiqZZljJ8ytMXAETJxc8qPw~jvl5~0rH6ph8XGBWj2cslwgtgqrpCzs6Jac58ix4idQU88ktxQd7~NATxV~2k3QuhftuwSyeJk9GNj9hnBTCQUmJS-Mjo9lgw3B2OR~XM-n7yq-7hXwtGmC7Ff0IQ54z-QYHxacM9R9R5b4Z77VY0KM74nVHcrY0kuJdeP1pe5RANQQHjX-smUkOF3dP7KMZCNzxn0KYWTO6YBsPhg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA">Armstrong et. al. 1999</a>, <em>JAMA</em> below).</p>
<p><img src="https://cdn.jamanetwork.com/ama/content_public/journal/jama/4590/m_joc80862f1.gif?Expires=2147483647&Signature=T2cIAfko7J-eH-QKTE7IgFXujBG7Ok1sBns7b-pi7mTHTqB9dAMZrhFoOhiK8WEJvInKgVxh8Eamo9doaFFFS2WA6acKRlHIgv2E7~~1b2ctAiqZZljJ8ytMXAETJxc8qPw~jvl5~0rH6ph8XGBWj2cslwgtgqrpCzs6Jac58ix4idQU88ktxQd7~NATxV~2k3QuhftuwSyeJk9GNj9hnBTCQUmJS-Mjo9lgw3B2OR~XM-n7yq-7hXwtGmC7Ff0IQ54z-QYHxacM9R9R5b4Z77VY0KM74nVHcrY0kuJdeP1pe5RANQQHjX-smUkOF3dP7KMZCNzxn0KYWTO6YBsPhg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA" alt="Curve of infectious disease deaths by year" /></p>
<p>Vaccines had been produced for several, previously ravaging diseases. The Hospital Service itself had identified the source of pellagra as a dietary deficiency, and identified common foodstuffs to prevent it. Just months later, <a href="https://en.wikipedia.org/wiki/History_of_penicillin?oldformat=true">penicillin would be discovered</a>. Ransdell had examples abound of biomedical success and the benefits of research investment, many of which were apparent without being named.</p>
<p>This context of rapid, geometric decay in mortality from disease seems essential to achieving broad public support for a high risk research enterprise.</p>
<h2 id="acceptance-passage-and-divisional-structure">Acceptance, Passage, and Divisional Structure</h2>
<p>Ransdell’s arguments were eventually accepted by both Houses of Congress and signed into law under Herbert Hoover, a rare excited about the application of scientific methods to all aspects of the public sphere (see <a href="https://www.notion.so/jacobkimmel/Hoover-An-Extraordinary-Life-in-Extraordinary-Times-9ecb67f0561541cdb8c15bceaf0cd57a">Hoover: An Extraordinary Life</a>).</p>
<p>Initially, the NIH was only a singular institute in a small building in the District of Columbia, almost exclusively focused on intramural research. It was not until the passage of the <a href="https://www.wikiwand.com/en/National_Cancer_Institute">National Cancer Institute Act</a> in 1937 that the NIH broadly adopted the practice of extramural research grants. Today, extramural grants to researchers at universities and private institutes makes up ~90% of the NIH budget. The NIH also settled into a “divisional” organizational structure in 1937, where divisions were defined by human diseases (a few exceptions exist in more modern times, the functionally focused National Center for Bio-Informatics being the most prominent).</p>
<hr />
<p>Harden’s history ends in 1937, but the history of the NIH only blossoms later on. I look forward to exploring the decisions that led to the current system and possible mechanisms for improvement based on other successful funding agencies, like <a href="https://en.wikipedia.org/wiki/Howard_Hughes_Medical_Institute?oldformat=true">HHMI</a>.</p>
<p>Are there other organizational structures or funding models that could help improve our rate of discovery?</p>This post began as a book review of Inventing the NIH by Victoria Angela Harden, but grew out a bit from there.