This is a news story, published by Quanta Magazine, that relates primarily to RNA news.
For more biology news, you can click here:
more biology newsFor more news from Quanta Magazine, you can click here:
more news from Quanta MagazineOtherweb, Inc is a public benefit corporation, dedicated to improving the quality of news people consume. We are non-partisan, junk-free, and ad-free. We use artificial intelligence (AI) to remove junk from your news feed, and allow you to select the best science news, business news, entertainment news, and much more. If you like biology news, you might also like this article about
DNA language models. We are dedicated to bringing you the highest-quality news, junk-free and ad-free, about your favorite topics. Please come every day to read the latest DNA sequences news, genomic large language model news, biology news, and other high-quality news about any topic that interests you. We are working hard to create the best news aggregator on the web, and to put you in control of your news feed - whether you choose to read the latest news through our website, our news app, or our daily newsletter - all free!
DNA language modelingQuanta Magazine
•Science
Science
72% Informative
Brian Hie: A genomic large language model (LLM) has been trained on large volumes of DNA.
The model picks up patterns that humans can’t see in DNA.
It uses those patterns to predict how changes to DNA affect the function of its downstream products, RNA and proteins.
Hie became interested in using language models for biology during graduate school.
Evo was trained on a “novel” consisting of many genomes — the E. coli genome alone is 2 million to 4 million base pairs.
Its training data set was also important: Its exposure to 2.7 million genomes from bacteria, archaea and viruses.
It shows the model evolutionary alternatives for life — different ways of expressing the same idea.
Evo is trained only on genomes from the simplest organisms, prokaryotes.
We want to expand it to eukaryotes - organisms such as animals, plants and fungi whose cells have a nucleus.
The model generated a million tokens freely from scratch — essentially, an entire bacterial genome.
VR Score
83
Informative language
87
Neutral language
58
Article tone
informal
Language
English
Language complexity
42
Offensive language
not offensive
Hate speech
not hateful
Attention-grabbing headline
not detected
Known propaganda techniques
not detected
Time-value
long-living
External references
no external sources
Source diversity
no sources
Affiliate links
no affiliate links