HomeEthics & SocietyAI model simulates 500 million years of evolution to create a novel...

AI model simulates 500 million years of evolution to create a novel fluorescent protein

Scientists have developed an AI system able to simulating lots of of tens of millions of years of protein evolution, making a novel fluorescent protein unlike any present in nature.

The research team, led by Alexander Rives at EvolutionaryScale, created a big language model (LLM) called ESM3 to process and generate details about protein sequences, structures, and functions. 

By training on data from billions of natural proteins, ESM3 learned to predict how proteins might evolve and alter over time.

“ESM3 is an emergent simulator that has been learned from solving a token prediction task on data generated by evolution,” the researchers explain within the study.

“It has been theorized that neural networks discover the underlying structure of the information they’re trained to predict. In this manner, solving the token prediction task would require the model to learn the deep structure that determines which steps evolution can take, i.e. the elemental biology of proteins.”

To test the model, the team prompted ESM3 to design a wholly recent green fluorescent protein (GFP) — a form of protein chargeable for bioluminescence in certain marine animals and widely utilized in biotechnology research.

The AI-generated protein, dubbed esmGFP, shares only 58% of its sequence with essentially the most similar known fluorescent proteins.

Remarkably, esmGFP exhibits brightness comparable to naturally occurring GFPs and maintains the characteristic barrel-shaped structure essential for fluorescence. 

The researchers estimate that producing a protein this distant from known GFPs would have taken over 500 million years of natural evolution.

More in regards to the study

The means of generating esmGFP involved several key steps:

  1. Data: Researchers trained ESM3 on roughly 2.78 billion natural proteins collected from sequence and structure databases. This included data from UniRef, MGnify, JGI, and other sources.
  2. Architecture: ESM3 uses a transformer-based architecture with some modifications, including a “geometric attention” mechanism to process 3D protein structures.
  3. Prompting: The researchers provided ESM3 with minimal structural information from a template GFP (the fluorescent protein).
  4. Generation: ESM3 used this prompt to generate novel protein sequences and structures through an iterative process.
  5. Filtering: Thousands of candidate designs were computationally evaluated and filtered to seek out the strongest candidates.
  6. Experimental testing: The most promising designs were synthesized and tested within the lab for fluorescence activity.
  7. Refinement: After identifying a dim but distant GFP variant, the researchers used ESM3 to further optimize the design, ultimately producing a brighter fluorescent protein.

The implications of this research extend beyond the creation of a single novel protein. 

ESM3 demonstrates a capability to explore protein design spaces far faraway from what natural evolution has produced, opening up recent avenues for creating proteins with desired functions or properties.

Dr. Tiffany Taylor, Professor of Microbial Ecology and Evolution on the University of Bath, who was not involved within the study, told LiveScience: “Right now, we still lack the elemental understanding of how proteins, especially those ‘recent to science,’ behave when introduced right into a living system, but it is a cool recent step that permits us to approach synthetic biology in a brand new way.”

“AI modeling like ESM3 will enable the invention of recent proteins that the constraints of natural selection would never allow, creating innovations in protein engineering that evolution can’t,” Dr. Taylor added.

Generative protein design

The researchers argue that ESM3 is just not simply retrieving or recombining existing protein information. 

Instead, it appears to have developed an understanding of the elemental principles governing protein structure and performance, allowing it to generate truly novel designs.

AI-driven protein research and design has reached a fever pitch, with DeepMind’s AlphaFold 3 predicting how proteins fold with incredible accuracy. 

AI-designed proteins have also shown excellent binding strength, showcasing that they’ve practical uses. 

However, like with any fast-moving technology that not directly interferes with biology, there are risks. 

First, if AI-designed proteins were to flee into the environment, they may potentially interact with natural ecosystems, even outcompeting natural proteins or disrupting existing biological processes. 

Second, they may trigger unexpected interactions inside living organisms, potentially even creating harmful biological agents or toxins. 

Researchers recently called for ethical guardrails for AI-protein design to forestall dangerous outcomes on this exciting, if unpredictable, field. 

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read