United StatesBusiness3 days ago

Why the Human Genome’s Tangled Physicality May Confound AI

The article discusses the complexity of the human genome beyond just the coding regions, highlighting that only 2% of the genome consists of actual genes. It emphasizes that understanding how these genes are regulated—rather than just identifying them—is a more challenging and crucial aspect of genomic research. This regulation determines how different cell types function and respond to environmental signals.

Introduction

Since its molecular structure was deduced in the 1950s, DNA has been hailed by many biologists as the secret of life. They’ve read and studied the information stored in the DNA found in the cells of living organisms, known as their genomes, and claimed that this genetic database must be some kind of blueprint, code script, or computer. But if DNA really does harbor some greater secret about how life works, biologists have yet to find it.

In fact, the human genome is less a script than a puzzle that gets harder the closer they look. Knowing the entire sequence — the order of all 3 billion or so of our DNA’s chemical building blocks, nearly fully deduced by the international Human Genome Project between 1990 and 2003 — hasn’t helped much. That investigation showed that barely 2% of the human genome consists of actual genes, the information-coding sequences of DNA.

It’s now clear that understanding the human genome is no longer a matter of figuring out what each gene does. The deeper and much harder question is how those genes are used, or regulated, a question that seems to involve some and perhaps much of the rest of the genome. By switching suites of genes on and off, the many different cell types in our bodies can all be created from the same material. Cells also regulate their genes from moment to moment in response to a constant inflow of signals from their neighbors and surroundings. But the processes that govern gene regulation are proving so complex that some biologists wonder whether a full understanding of it — of how the genome really works — will ever be within the grasp of our puny minds.

Some are counting on outsourcing the analysis to artificial intelligence. Genomic “foundation models” such as Evo 2, Genos, and Google DeepMind’s AlphaGenome are trained on vast quantities of genomic data , which biologists use to make predictions about how differences in DNA sequence affect biological processes and ultimately the traits (including disease risk) of a whole organism. These algorithms don’t worry about the complicated regulatory stuff going on; all of that is supposedly subsumed by the algorithm’s “training,” through which it deduces correlations from cases we already know about.

This approach is likely to be useful, but for those who crave real understanding of how the genome, and ultimately life itself, works, a computational black box will never suffice. And perhaps more to the point, the genome might not submit to the kind of straightforward input-output approach that such AI models ultimately assume.

That’s because the genome is no blueprint or algorithm. It is something else.

The Old View

Given that it’s the product of around 4 billion years of evolution, perhaps it’s not surprising that our genome is complicated. The surprise has been what those complications are. “Our genome is not what we might make it if we sat down at the drawing board,” said the biologist Karen Adelman , who studies gene regulation at Harvard Medical School.

The traditional view posits that a small proportion of our DNA holds the code for making the protein molecules that orchestrate our cells’ chemistry. Each instruction for a protein is held in a corresponding gene — we have around 20,000 of these — and gene sequences can range in length from a couple of dozen to almost 3 million DNA “letters” (representing molecules called nucleotides). Making a protein from its gene is a two-stage affair. First the DNA is read, letter by letter, by an enzyme called a polymerase, which creates a copy of that code in a related molecule called messenger RNA (mRNA). This is called transcription. The mRNA is then read by a piece of molecular machinery called the ribosome, which constructs the protein — a process called translation. The proteins made by the ribosome then go off to do their jobs in making and sustaining the organism.

This picture is still more or less correct. But it turns out that “the genes are probably not the most interesting part of the genome,” Adelman said.

What matters more is how our genes, many of which we share with simpler organisms, are regulated: turned on and off. Which proteins a cell needs changes over time and according to cell type: muscle, brain, skin, and so on. How the genes that encode those proteins are regulated depends on some of the genome that doesn’t code for proteins.

Biologists have known about gene regulation, and the involvement of “noncoding” DNA, since the 1960s. But for many years, most of what they understood about this came from studies of simple organisms like bacteria, where the principles are generally straightforward. It has gradually become clear, though, that in complex eukaryotic organisms like us, gene regulation is far more complicated, involving overlapping systems of oversight and control, each with its own intricacies.

Transcription Factors

Transcription gets started by proteins called transcription factors, which are like the operations managers…

Read the full article at Quanta Magazine →

Source document: Human Genome Project

1 reports

Quanta MagazineIndependentCenter3 days ago

Why the Human Genome’s Tangled Physicality May Confound AI

Bias read (Center): The article presents scientific findings without overt ideological framing. It focuses on biological complexity and challenges in genomic research, avoiding political commentary or biased language.

Official sources cited

organisation Human Genome Project

Go to the primary sources (1)

The official sources this coverage is built on. Read them directly to bypass framing.

organisationHuman Genome Project