AI in Science: How Evo 2 Is Redesigning Life Itself

In March 2026, a team from Arc Institute, NVIDIA, Stanford, UC Berkeley, and UCSF published a paper in Nature that changes what AI means for science. They built a model called Evo 2 — 40 billion parameters trained on 9.3 trillion DNA base pairs from 128,000 species across every domain of life on Earth.

Evo 2 reads DNA the way GPT reads English. It predicts which mutations cause disease with over 90% accuracy. It designs gene therapy elements that activate only in specific cell types. It generates entire bacterial genomes from scratch. And it discovered fundamental biological concepts — viral signatures, protein structures, exon-intron boundaries — purely by reading DNA sequences, without anyone teaching it what those concepts are.

The model is fully open-source. Code, weights, training data — all free on GitHub. This is not a product announcement. It is a shift in what is possible.

The short version

AI just learned to read and write the code of life. Not metaphorically — literally. Evo 2 processes up to 1 million nucleotides simultaneously, which is enough to analyze an entire bacterial genome in one pass. It was trained on more DNA than any model before it — 9.3 trillion nucleotides spanning bacteria, archaea, plants, animals, and humans. It classifies cancer-causing mutations, designs targeted therapies, and generates functional proteins — all without task-specific fine-tuning.

If AlphaFold was the moment AI solved protein structure, Evo 2 is the moment AI learned to engineer biology.

What Evo 2 actually does

Predicts disease-causing mutations

Evo 2 was tested on over 3,000 variants of the BRCA1 gene — one of the most studied breast cancer genes. Without ever being specifically trained on BRCA1 data, it predicted which mutations are pathogenic (disease-causing) versus benign with over 90% accuracy.

This matters because genetic testing today produces thousands of "variants of unknown significance" — mutations where doctors cannot tell patients whether they are at risk. Evo 2 can classify these variants in hours instead of the months or years required by traditional laboratory research.

The model also predicts functional impacts of noncoding mutations — changes in DNA regions that do not directly encode proteins but regulate when and where genes are active. These noncoding regions make up 98% of the human genome and are among the least understood parts of biology. Evo 2 reads them like a language it already knows.

Designs targeted gene therapy

One of the most promising applications: designing genetic regulatory elements that activate only in specific cell types. A gene therapy meant to treat a brain disorder should activate only in neurons. A liver treatment should activate only in hepatocytes. Off-target activation causes side effects.

Evo 2 designs these regulatory elements with high precision. It achieved an AUROC above 0.9 in designing DNA sequences with specific chromatin states — meaning it can control whether a gene is accessible (active) or silent in a given cell type. This is the difference between a therapy that works and one that causes harm.

Generates entire genomes

Evo 2 can generate novel genome sequences up to 1 million base pairs long — the scale of a simple bacterial genome. Researchers used it to extend human mitochondrial DNA snippets into complete functional genomes, design yeast chromosomes, and create bacterial genomes from scratch.

The generated proteins were verified with AlphaFold 3, achieving pLDDT confidence scores between 0.67 and 0.83 — comparable to natural proteins. These are not random sequences. They are functional biological designs.

Discovers biology on its own

When researchers analyzed what Evo 2 learned, they found something remarkable. Using a sparse autoencoder decoder, they discovered the model had autonomously identified fundamental biological concepts from raw DNA sequences:

Viral genetic signatures
Protein secondary structures (alpha-helices, beta-sheets)
Exon-intron architecture (how genes are structured within DNA)
Regulatory element patterns

Nobody taught the model what a protein fold is. Nobody labeled the training data with exon boundaries. The model learned these concepts the same way it learned everything else — by reading 9.3 trillion characters of the language of life and finding the patterns.

How it was built

Architecture: StripedHyena 2

Standard transformer architectures have a fundamental scaling problem: computational cost grows quadratically with sequence length. Doubling the context window quadruples the compute. This makes million-token context windows prohibitively expensive with standard attention.

Evo 2 uses StripedHyena 2, a hybrid architecture that combines convolutional operators (for short-range patterns) with attention operators (for long-range dependencies). This achieves near-linear scaling — meaning the 1 million nucleotide context window is practical, not just theoretical.

Scale

Metric	Evo 1 (2024)	Evo 2 (2026)	Improvement
Parameters	7B	40B	5.7x
Training data	300B nucleotides	9.3T nucleotides	31x
Context window	131K nucleotides	1M nucleotides	8x
Species covered	Prokaryotes only	All domains of life	First time
Compute	~13 H100 GPUs equivalent	2,000 H100 GPUs	150x

For context: the compute used to train Evo 2 is roughly 150 times what was used for AlphaFold and double what was used for ESM3 (Meta's protein model). This is frontier-scale AI applied to biology.

The training data: OpenGenome2

The training dataset — OpenGenome2 — contains 8.8 trillion non-redundant nucleotides from a curated atlas of genomes spanning all three domains of life:

Bacteria and archaea: Expanded and deduplicated collections of prokaryotic genomes
Eukaryotes: Humans, plants, fungi, insects, fish, mammals
Viruses: Bacteriophages (excluded eukaryotic viruses for biosecurity)

The dataset itself is released as open-source. Any researcher can use it to train their own models.

A 7B version exists

The full 40B model requires serious hardware (multiple high-end GPUs). But Arc also released a 7-billion-parameter version that runs on consumer hardware. This smaller model retains surprising capability — strong mutation prediction and basic genome design — at a fraction of the compute cost.

On a Mac Mini M4 Pro with 48GB unified memory, the 7B version would run at approximately 15-20 tokens per second via Ollama — fast enough for research workflows.

Why this matters for business

Evo 2 is a biology paper. But its implications reach far beyond the lab.

Pharmaceuticals: drug discovery compressed

Traditional drug development takes 10-15 years and costs $2-3 billion per approved drug. The bottleneck is not chemistry — it is understanding which genetic targets matter and predicting which interventions will work.

Evo 2 classifies genetic variants in hours instead of months. It predicts off-target effects before a molecule enters a test tube. It designs regulatory elements with precision that previously required years of trial-and-error experimentation. For pharma companies, this compresses the earliest phases of drug discovery — target identification and validation — from years to weeks.

Agriculture: crop genome optimization

The same model that predicts disease mutations in humans can predict beneficial mutations in crops. Drought resistance, pest tolerance, nutritional density — these traits are encoded in DNA. Evo 2 reads that code across 128,000 species, including plants. It identifies patterns that connect genetic variation to functional outcomes across evolutionary time.

For agricultural biotech companies, this means faster development of crop varieties optimized for changing climates — without the regulatory complexity of introducing foreign genes. The modifications can be guided by natural variation that already exists in the species' genome.

Diagnostics: instant variant classification

Genetic testing companies like 23andMe, Color Health, and Invitae generate millions of variants of unknown significance every year. Each one requires expensive manual curation or remains unclassified — leaving patients and doctors without actionable information.

Evo 2 classifies these variants automatically. Over 90% accuracy on BRCA1 with zero fine-tuning suggests that classification of variants across thousands of disease-associated genes is within reach. The business opportunity is a diagnostic pipeline that converts every unknown variant into a classified risk — at the speed of inference rather than the speed of laboratory research.

Gene therapy: precision targeting

The gene therapy market is projected to reach $44 billion by 2030. The biggest challenge is not delivery — it is specificity. A therapy that activates in the wrong cell type causes side effects. Evo 2 designs regulatory elements with cell-type specificity at AUROC above 0.9. This directly reduces the failure rate of gene therapy candidates in clinical trials.

Biotech startups: the playing field is level

Evo 2 is open-source. A two-person startup in Nairobi has the same access to the model as Pfizer. The code, the weights, the training data, and the inference pipeline are all free on GitHub. The 7B version runs on consumer hardware. The API endpoint is available for larger jobs.

This is the same pattern that played out in software development with open-source LLMs. When the tools are free, the competitive advantage shifts from who has the most compute to who has the best ideas and the fastest execution.

The pattern that connects everything

Evo 2 is not an isolated breakthrough. It is the latest instance of a pattern that is repeating across every domain:

Domain	Foundation model	What it does
Text	GPT-4, Claude	Reads and writes human language
Code	Claude Code, Codex	Reads and writes software
Proteins	AlphaFold	Predicts protein structure from sequence
Images	DALL-E, Midjourney	Generates images from descriptions
Biology	Evo 2	Reads, writes, and designs DNA
Business	AI agents	Operates business processes autonomously

The architecture is the same in every case: a large model trained on massive domain-specific data, connected to domain tools, running an autonomous reasoning loop. AlphaFold solved protein structure. Evo 2 solves genome understanding. Our agent system solves business operations. The pattern is identical — the domain changes.

This is why understanding AI agents matters even if you never touch biology. The same architecture that designs genomes is the architecture that runs your outreach pipeline, your competitive intelligence, your campaign management. If you understand how one works, you understand how all of them work.

From reading to writing to designing

The progression is consistent:

Reading — the model understands the domain (GPT reads text, Evo 2 reads DNA)
Writing — the model generates new content in the domain (GPT writes essays, Evo 2 writes gene sequences)
Designing — the model creates novel, functional outputs optimized for specific goals (agents design outreach strategies, Evo 2 designs targeted therapies)

We are at stage 3 in text and code. We are entering stage 3 in biology. Every industry will follow this trajectory. The question is not whether AI will design things in your domain — it is when, and whether you are positioned to use it when it does.

Open-source changes the game

Evo 2 is released under an open-source license. Everything:

Model weights — full 40B parameter model + 7B smaller version
Training code — reproduce the model from scratch
Inference code — run it on your own hardware
OpenGenome2 dataset — 8.8 trillion non-redundant nucleotides
Evo Designer — browser-based interface for non-programmers
NVIDIA BioNeMo integration — enterprise deployment pipeline

This follows the same pattern reshaping every AI domain. Llama, Qwen, DeepSeek, Mistral — the most impactful models are open. The proprietary advantage is shrinking. What remains proprietary is not the model but the application — how you deploy it, what data you fine-tune it on, what problems you solve.

For business leaders: this means the cost of accessing frontier AI capabilities is trending toward zero. The competitive moat is no longer "we have a better model." It is "we deploy it faster and more effectively." This is exactly what autonomous agent systems are built to do.

The self-hosting imperative for genomic data

Genetic data is the most sensitive data that exists. It cannot be changed, it identifies individuals uniquely, and it reveals predispositions to disease. The regulatory frameworks reflect this:

HIPAA (US) — healthcare data including genetic information requires strict access controls
GINA (US) — the Genetic Information Nondiscrimination Act prohibits insurers and employers from using genetic data
GDPR (EU) — genetic data is classified as a "special category" with the highest protection requirements
EU AI Act — AI systems processing health data face the strictest compliance tier

When a research lab sends genetic sequences to a cloud API for analysis, those sequences travel across networks, reside on third-party servers, and are subject to the cloud provider's legal jurisdiction. In the US, the CLOUD Act compels providers to hand over data to law enforcement regardless of where it is stored.

Self-hosted inference eliminates this risk entirely. Running Evo 2's 7B model on a local Mac Mini or the full 40B on dedicated GPU hardware means patient genomes, research data, and proprietary sequences never leave the building.

This is the same argument that applies to business data — but amplified. A leaked email list is a privacy incident. A leaked genome database is an irreversible violation. For any organization working with genetic data, self-hosted AI is not a preference. It is a legal and ethical requirement.

What this means for AI education

If AI can design genomes, identify cancer-causing mutations, and generate functional proteins — what exactly can't it do?

The answer, for now: it cannot replace human judgment about what should be built. Evo 2 can design a gene therapy element. It cannot decide whether that therapy should be developed, what the clinical trial should look like, or how to navigate the regulatory pathway. It cannot weigh the ethical implications of germline editing. It cannot build the business that brings the therapy to patients.

This is the same division of labor that defines our agent system: AI handles the operational work. Humans handle the strategic, ethical, and creative decisions. The agents send emails and analyze competitors. The founder decides which market to enter.

The tutorial trap applies everywhere

In biology, the tutorial trap looks like this: universities teach about CRISPR, about genomics, about bioinformatics. Students graduate with knowledge but not capability. They understand the theory but cannot deploy the tools.

The same gap exists in AI for business, AI for healthcare, AI for legal, AI for finance. Knowledge without deployment is the tutorial trap. The market is full of courses about AI. It is nearly empty of systems that actually run AI for you.

Evo 2 is a deployment tool, not a course. It generates functional outputs — real DNA sequences that can be synthesized and tested in a lab. Our agent system is the same: it generates functional outputs — real emails sent, real leads enriched, real campaigns executed. The value is in the doing, not the learning about doing.

The skills that matter

The researchers who built Evo 2 needed three things:

Domain expertise — deep understanding of biology, not just AI
Systems architecture — designing the training pipeline, data curation, inference infrastructure
Deployment capability — making the model actually work in production, not just on benchmarks

These are the same three skills that define effective AI deployment in any domain. Understand your business domain. Architect the system. Deploy it in production. The AI University teaches all three — not through courses, but through a live system where you see 31 agents doing exactly this every day.

The biosecurity question

Evo 2's capabilities raise legitimate safety concerns. A model that can design genomes could theoretically design harmful organisms. The researchers addressed this directly:

Eukaryotic viruses excluded from training data
Pathogen sequences filtered from the training dataset
Safety testing confirmed the model refuses productive queries about known dangerous organisms
Stanford's bioethics lab oversaw responsible development protocols

This is worth studying because the same questions apply to every powerful AI system. Our agent system has its own version: the trust verification gate, the constitution-based guardrails, the risk classifier, the human review gates. Power without governance is dangerous. Every AI system needs both.

The fact that Evo 2 is open-source makes the safety question harder and more important. Anyone can download and run the model. The safety measures are voluntary, not enforced. This is the tension that every open-source AI project faces — and the reason that deployment infrastructure (guardrails, audit trails, kill switches) matters as much as the model itself.

Key takeaways

AI learned to read and write DNA. Evo 2 processes 1 million nucleotides simultaneously, trained on 9.3 trillion base pairs from 128,000 species. This is the AlphaFold moment for genomics.
Over 90% accuracy on BRCA1 mutations without task-specific training. Variants of unknown significance — which leave patients in diagnostic limbo — can now be classified in hours instead of years.
Entire genomes designed from scratch. Functional bacterial genomes, targeted gene therapy elements, novel proteins verified by AlphaFold 3. The model does not just understand biology — it engineers it.
The model discovered biology on its own. Viral signatures, protein folds, exon-intron boundaries — all learned from raw sequence data without labels. The model sees patterns in DNA that humans have not yet cataloged.
It is fully open-source. Weights, code, training data, designer interface — all free. A two-person startup has the same access as Pfizer. The 7B version runs on consumer hardware.
The pattern repeats across every domain. GPT for text. AlphaFold for proteins. Evo 2 for genomes. AI agents for business. Same architecture: foundation model + domain tools + autonomous reasoning. Understand one and you understand all.
Genomic data demands self-hosting. HIPAA, GINA, GDPR special category, EU AI Act — genetic data has the strictest protections. Cloud APIs are a compliance risk. Local inference is the answer.
The tutorial trap applies to science too. Knowing about CRISPR is not the same as deploying gene therapy. Knowing about AI is not the same as running 31 agents. The value is in deployment, not education.
Power requires governance. Evo 2 can design genomes. It could theoretically design harmful ones. The safety measures — pathogen filtering, query refusal, ethics oversight — are the biological equivalent of trust verification gates, risk classifiers, and human review. Every powerful AI system needs both capability and constraint.

Source: Brixi et al., "Genome modelling and design across all domains of life with Evo 2." Nature (2026). Model available at Arc Institute and GitHub.

AI in Science: How Evo 2 Is Redesigning Life Itself (2026)