RILLIE: RNA In Silico Evolution via LLM and Inverse folding
March 14, 2025 Β· View on GitHub
β¨ Welcome to the official repository for "RILLIE: RNA In Silico Evolution via LLM and Inverse folding".
-- π Read our paper: ArXiv --
Overview
π We introduce RILLIE, a general RNA foundation model that integrate sequence and structure information to evolve RNA in a zero-shot fashion. RILLIE integrating a large language model with an inverse folding model can generate functional RNA sequences aligning with natural evolutionary patterns at the sequence level while preserving the structural integrity of key functional regions. Using RILLIE, we successfully evolved two engineered RNA aptamers, Broccoli and Pepper, with a high success rate, low sequence similarity, improved binding affinity and fluorescence in live cell.

π Contents
Installation
If you prefer a faster setup, you can use the provided RILLIE.yaml file:
conda env create -f RILLIE.yaml -y
conda activate RILLIE
You can also install the environment either by following the step-by-step instructions below.
# Create a conda environment
conda create -y -n RILLIE python=3.10
conda activate RILLIE
# Install PyTorch and CUDA dependencies
pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 torchaudio==2.1.0+cu121
# Install PyTorch Geometric and related dependencies
pip install torch-geometric==2.6.1 torch-scatter==2.1.2+pt21cu121
# Install Bioinformatics and Structural Biology packages
pip install \
biopython==1.84 \
bio==1.7.1 \
biothings-client==0.3.1 \
biotite==1.1.0 \
biotraj==1.2.2 \
mygene==3.2.2 \
prody==2.4.1 \
pymatgen==2024.8.9 \
spglib==2.5.0 \
openmm==8.1.1 \
simtk==0.1.0 \
rdkit-pypi==2021.3.4
# Install Machine Learning and Deep Learning dependencies
pip install \
scikit-learn==1.6.0 \
torchdrug==0.2.1 \
transformers==4.47.0 \
pytorch-lightning==2.4.0 \
lightning==2.4.0 \
torchmetrics==1.6.0 \
peft==0.14.0
# Install Data Processing and Computation Libraries
pip install \
numpy==1.26.3 \
scipy==1.14.1 \
pandas==2.2.2 \
numba==0.60.0 \
sympy==1.12 \
tqdm==4.66.5 \
joblib==1.4.2 \
threadpoolctl==3.5.0
# Install Visualization Tools
pip install \
matplotlib==3.9.2 \
seaborn==0.13.2 \
plotly==5.23.0 \
bokeh==3.6.2 \
datashader==0.16.3 \
holoviews==1.20.0
# Install Web & API Utilities
pip install \
requests==2.32.3 \
aiohttp==3.11.10 \
huggingface-hub==0.26.5 \
pyyaml==6.0.2 \
urllib3==1.26.13
# Install Miscellaneous Tools
pip install \
rna-fm==0.2.2 \
ml-collections==0.1.1 \
uncertainties==3.2.2 \
markdown==3.7 \
jsonargparse==4.34.1
Benchmark
Dataset Description
We collected 6 ncRNA DMS datasets including tRNA, RNA aptamer and ribozyme from previous papers or private data:
Model Description
Our benchmark includes following models:
-
RNA language models:
-
DNA language models:
-
RNA inverse-folding models:
Generate RNA Structure
We use RhoFold and AlphaFold3 to generate RNA 3D Structure
We use Chai to generate RNA 2D structure (as the input of RILLIE).
Zero-shot ncRNA fitness prediction
The average spearman and pearson correlations across 6 datasets can be visualized through following command:
python ./RILLIE/utils/ncRNA_fitness_prediction_average.py
Spearman corelations are visualized as follows:
The specific spearman and pearson correlations across 6 datasets can be visualized through following command:
python ./RILLIE/utils/ncRNA_fitness_prediction_all.py
Spearman corelations are visualized as follows:
High-fitness sensitivity analysis
python ./RILLIE/utils/ncRNA_fitness_prediction_all.py
Spearman corelations are visualized as follows:

Using RILLIE
CheckPoints
You can download the model checkpoint from Google Drive link.
Then, place the downloaded data into the ./RILLIE/model/IFM/checkpoint directory.
General RNA Evolution
1γgenerate the secondary and tertiary structures of RNA to be evolved
secondary structure (.npy) can be generated by RhoFold or Chai
tertiary structure can be generated(.pdb) by Chai or AlphaFold3 or RhoFold
2γPlace .pdb file into the ./RILLIE/model/IFM/data/test directory
Place .npy file into the ./RILLIE/model/IFM/data/test_ss directory
Tips: .pdb file and .npy file should have the same name
Example: test_1.pdb and test_1.npy
3γScore the sequence lod-likelihood and pick out top X% sequences using following command:
python ./RILLIE/model/IFModel/src/score_sequence_joint_likelihood.py
Example Output:
Processing pdb files for sequence: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 27.23it/s]
Sequence Sequence_63: IFM score = -3.0081936583227042, LLM score = -1.1260515451431274
Scoring sequence Sequence_64...
Processing pdb files for sequence: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 26.98it/s]
Sequence Sequence_64: IFM score = -2.9159689144212373, LLM score = -1.0990711450576782
Threshold for IFM (top 10.0%): -2.946614666374362
Threshold for LLM (top 10.0%): -1.0952333569526673
Selected 2 sequences out of 64.
Multi-Round Evolution (Optional)
Based on the wet-lab testing results, we can discarding harmful mutations to increase the success rate, while introducing new mutations to help direct evolution escape local optima and discover global optima. This approach enables efficient directed evolution without retraining the model and is specially useful when the tested variants are very few.
1γVisualize Fitness Heatmap(such as Fluorescence or Affinity) based on the previous wet-lab testing results
python ./RILLIE/utils/wet_data_analysis.py
Spearman corelations are visualized as follows:

2γDiscard harmful mutations to increase the success rate
The mutations in dark color indicates harmful mutations.
3γIntroduce new mutations and visualize the mutational distribution
python ./RILLIE/utils/visualize_mutational_distribution.py
Mutational distribution(e.g. broccoli) is visualized as follows:

4γScore the sequence lod-likelihood and pick out top X% sequences using following command:
python ./RILLIE/model/IFModel/src/score_sequence_joint_likelihood.py
License
No Commercial use of either the model nor generated data, details to be found in license.md.
Acknowledgements
Our work builds upon AIDO.RNA(1.6B),RiNALMo,RNAFM,RNAMSM,Evo 1,Nucleotide Transformer,Grover,GENA ,RhoDesign,RhoFoldThanks for their excellent work and open-source contributions.