We’ve built models that can write code, paint surreal portraits, and play chess like gods. We’ve folded proteins, generated DNA sequences, and even simulated molecular interactions. And yet, we still don’t truly understand how a single cell really works.
Why? Because biology isn’t just data to be parsed or code to be run. It’s not a closed system like chess or language. It’s alive. Contextual. Emergent. Messy.
And the truth is, we can’t simulate our way to understanding life.
Welcome to another edition of Almost Absurd. (It's about time, I'd say)
Over the past few months, I’ve been neck-deep in the world where AI meets biology -papers, preprints, late-night rabbit holes. What started as a spark has turned into full-blown obsession.
Yesterday, my friend Yeshwanth asked, “Do people even read blogs anymore?” It made me pause. In a world drowning in noise, I still come back to a handful of blogs that make me think, not scroll.
So here’s my shot at that: something high-signal, a little nerdy, and hopefully worth your time.
Alright, let's get to it!
We’ve Always Wanted to Make Biology Programmable
Since the first genome was sequenced, the dream has been clear: make biology as programmable as software. Design a cell the way you'd design a chip. Upload DNA like code. Predict the output.
CRISPR seemed to bring us close. But while editing became easy, designing remained impossibly hard. We can cut, paste, and rewrite DNA, but we can't reliably predict what that edit will do inside a living organism.
AlphaFold gave us protein structures. ProGen gave us synthetic proteins. ESM gave us generative models over sequences. Yet biology remains opaque, elusive, and shockingly unpredictable. We'll get to why that's the case.
But why even go through this ordeal?
Simple: biology is the most efficient tech stack nature has ever built.
Drugs that work better and faster
Crops that survive heatwaves and floods
Enzymes that eat plastic
Proteins that capture CO2 and spit out fuel
Heck, your brain- a 1.3 kg lump of mostly fat and water - can do things today's largest AI models can’t. And it runs on the power of a dim lightbulb, as compared to megawatts in a data centre.
That’s not just impressive. That’s absurdly powerful.
So yeah, programmable biology isn’t a side quest. It’s the main game.
But to program biology, we need to understand it first. And that’s where our current tools fall short.
Enter the Hype: Can LLMs Solve Biology?
There's a certain type of techno-optimist who believes LLMs are the universal solvent. Who believes GPT-5 is going to cure cancer, reverse climate change, and tuck your kids into bed.
But let’s zoom out.
Large Language Models have redefined what we thought was possible with just text. They're amazing at learning statistical correlations, extracting patterns, and generating eerily coherent prose and code. So, it's tempting to think: what if we just scaled them up for biology?
How LLM’s work
Here’s the rub: LLMs are brilliant mimics. They don’t invent new laws of biology. They reflect what we already know. We’ve seen this play out in Robotics where researchers quickly realized: you can’t get to safe, intelligent machines by guessing the next token. You need world models- representations grounded in physics, geometry, dynamics, and feedback. Models that know what happens when a wheel turns or a gripper closes. And we now have so many startups building in this space!
Biology is even messier.
Cells don’t follow static scripts. A gene doesn’t do the same thing across time, tissues, or temperatures. Feedback loops are everywhere. Two proteins with very similar structures might have totally different amino acid sequences and vice versa.
Just like robotics turned to world models, biology needs the same.
We need a world model for biology- one that’s grounded, embodied, and capable of learning the causal logic of living systems.
Why Current AI in Biology Falls Short
Biology isn’t just complex - it’s chaotic, contextual, and deeply interconnected. Traditional AI, trained on clean, labeled datasets with fixed structure, falls apart when faced with this mess.
Still, we've tried. We’ve built cutting edge models such as:
AlphaFold: It gave us the structure of every known protein structure – over 200 million of them! gave us protein structure - but structure alone doesn’t explain function, regulation, or interaction.
ESM, ProGen and friends generate impressive sequences - but they’re agnostic to cellular context.
Gene expression models predict transcript levels - but ignore metabolic state, spatial organization, or feedback.
Single-cell embeddings tell us what a cell looks like - but not what it will do next, or how it’ll respond to a perturbation.
Biology doesn’t have fixed inputs, consistent rules, or ground truths. It's dynamic. It's modular. It evolves. We need a unified model that learns across scales and systems. Not just what DNA says, but what it means, does, and becomes. But, we’ll get there through a ladder and these models are the legs.
Can we simulate biology? Just like we simulate the real world in games?
Just like we simulate the real world in video games or physics engines, there’s been growing excitement about simulating biology - especially cells. Demis Hassabis of DeepMind has spoken about building a complete model of a cell, one that can predict how it behaves in every scenario.
Well, in theory - yes. In practice, let’s unpack this.
To simulate a single E. coli bacterium at atomic resolution would require tracking ~10¹⁰ atoms, across millions of interactions, every femtosecond (10⁻¹⁵ seconds). Even with the world’s fastest supercomputers, like Frontier or Fugaku, you’d only be able to simulate a few milliseconds of real time in months of compute. Multiply that by trillions for more complex cells, tissues, or systems, and you’re asking for a universe-sized machine to simulate life in full.
It’s like trying to simulate all of human society - every thought, conversation, random coincidence - just to predict whether a single person will decide to go for a run tomorrow.
So no - we don’t need to simulate every atom. We need models that capture the essence. That can predict what matters, not reproduce what doesn’t. What we need isn’t a simulator. What we need is a world model for biology - one that understands enough of life’s rules to reason, predict, and design within it.
Biology Is Bigger Than You Think
Biology isn’t just healthcare.
It’s agriculture - with climate-resilient crops and precision breeding.
It’s energy - with engineered microbes producing clean fuels. It’s industry - with custom enzymes replacing toxic chemicals.
It’s materials - with bio-based plastics, textiles, and composites.
It’s defense - with biosecurity, rapid vaccine design, and pathogen surveillance.
Some numbers:
Global bioeconomy: $1.55T today, projected to surpass $6T by 2040
Biomanufacturing: already a $300B+ market
Synthetic biology: growing at 20% CAGR, expected to reach $80B by 2030
Governments are pouring billions into sovereign bio foundries and genome infrastructure
If LLMs are the new oil, then LBMs are the next electricity grid-powering the engines of the biological future.
Enough drumroll. What Does a Large Biology Model Even Mean?
To be honest, we don’t know that yet.
It might consist of models for entities such as protein-scale model: trained on all known proteins across life, capturing not just sequences and structures, but folding dynamics, chemical reactivity, binding energetics, and context-specific functions. A kind of biochemical orchestra that integrates the central dogma with the physics of amino acid chains. I can draw a lot of parallels to multi-modal models based LLMs.
Or it might be organism-scale models - one that reasons about an entire cell or tissue as a living, adaptive system. A model that takes gene edits, environmental cues, and signaling pathways as input - and tells you how the cell will behave, grow, or respond.
What we do know is this:
A true Large Biology Model won’t just simulate atoms.
It will learn what matters - across molecules, cells, and systems - and help us predict, design, and intervene in the living world.
Why Now?
Because everything is converging.
Massive biological datasets - single-cell, spatial, imaging, multi-omics
Breakthroughs in AI architecture - transformers, MoEs, physics-informed models
Wet-lab automation, cloud biofoundries, and active learning pipelines
And strategic investments from the US, China, EU, and India in scaling bio-industrial infrastructure
Add to that the climate imperative, the fragility of pharma pipelines, and the geopolitical urgency of bio-sovereignty.
Biology is becoming the next geo-industrial complex. And AI is the key to scaling it.
I’ve been reading and thinking deeply about this - especially what it would take to build a meaningful Large Protein Model: one that can reason about structure and function, not just predict them.
A model that helps us design novel gene editors, enzymes, therapeutics, and more - not by brute force, but by understanding the underlying logic of life.
Imagine just asking- Design me a CRISPR nuclease that targets this DNA sequence in this crop, avoids off-targets, and works in agrobacterium mediated delivery - and getting a fully functional protein that you can validate!
If this is something that excites you too - or you’re building in this space - drop me a note. I’d love to jam!
Almost absurdly, Tanay
Great read. Well done Tanay