README.md

June 19, 2026 · View on GitHub

A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM Computing Surveys, 2026.'.


Important

Contributions welcome:

Contact us or submit a pull request for unlisted relevant papers, content clarifications, or categorization adjustments, and update relevant information once your paper is accepted. Thank you!


💥 News 💥

  • 🔥🔥🔥 Our survey, accepted by ACM Computing Surveys, please cite it or the library if helpful.
  • 🔥🔥🔥 We flagged papers using models of size \geq 7B (or small-sized mainstream LLMs) in their experiments.

Abstract

Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.

Model Merging

Citation

If you find our paper or this resource helpful, please consider cite:

@article{yang2026ModelMergingSurvey,
  author = {Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
  title = {Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications, and Opportunities},
  year = {2026},
  issue_date = {June 2026},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {58},
  number = {8},
  issn = {0360-0300},
  url = {https://doi.org/10.1145/3787849},
  doi = {10.1145/3787849},
  journal = {ACM Comput. Surv.},
  month = feb,
  articleno = {216},
  numpages = {41}
}

Thanks!


Framework


Survey

Paper TitleYearConference/Journal
Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions2026Arxiv
Scaling Intelligence Through Model Merging: A Comprehensive Survey2025Arxiv
Democratizing AI Through Model Fusion: A Comprehensive Review and Future Directions2025Arxiv
From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches2025Arxiv
SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques2024Arxiv
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities2024Arxiv
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning2024Arxiv
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models2024Arxiv
Learn From Model Beyond Fine-Tuning: A Survey2023Arxiv
Deep Model Fusion: A Survey2023Arxiv

Benchmark/Evaluation

Paper TitleYearConference/JournalRemark
merge-and-rebase2026GithubCodebase for model merging, task-vector transport, and configurable fine-tuning across vision and text models. It is built for fast iteration on checkpoint merging, rebasing, and evaluation workflows. Supports both Vison and Language merging.
crdt-merge2026GithubCRDT-based distributed model merging with formal convergence guarantees. 25 strategies (SLERP, TIES, DARE, Fisher, evolutionary). Two-layer OR-Set architecture enabling conflict-free multi-node merge.
Conflict-Free Replicated Data Types for Neural Network Model Merging: A Two-Layer Architecture Enabling CRDT-Compliant Model Merging Across 26 Strategies2026SSRNcrdt-merge
An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation2025ArxivLLAMA-2-7B, LLAMA-3-8B, LLAMA-3.1-8B, QWEN2-7B
A Systematic Study of Model Merging Techniques in Large Language Models2025ArxivLlama-3.2-3B-Instruct, Llama-3.1-8B-Instruct, Qwen3-4B, Qwen3-8B
FusionBench: A Comprehensive Benchmark of Deep Model Fusion2025JMLRMistral-7B-v0.1, MetaMath-Mistral-7B, dolphin-2.1-mistral-7b, speechless-code-mistral-7b-v1.0
Towards Performance Consistency in Multi-Level Model Collaboration2025ICCV
Model Merging Scaling Laws in Large Language Models2025ArxivQwen2.5 0.5, 1.5, 3, 7, 14, 32, 72B
FBMS: An R Package for Flexible Bayesian Model Selection and Model Averaging2025Arxiv
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging2025ArxivQwen2-VL-7B-Base, Vicuna-7B-v1.5
MergeBench: A Benchmark for Merging Domain-Specialized LLMs2025ArxivLlama-3.2-3B, Llama3.1-8B, Gemma-2-2B and Gemma-2-9B
Mergenetic: a Simple Evolutionary Model Merging Library2025System DemonstrationsMistral-7B
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness2025NeurIPSLLaVA-v1.5-7B
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging2025ArxivLlama-3-8B-Instruct, Mistral-7B-Instruct-v0.2
How to Merge Your Multimodal Models Over Time?2024Arxiv
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning2024ArxivAya 23 8B
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models2024ArxivLLaMA3-8B-Instruct, Qwen2-7B-Instruct, Mistral-7B-Instruct-v0.3,
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild2024NeurIPS Track on Datasets and BenchmarksSynthia-7B-v1.2, Llama-2-7b-evolcodealpaca, OpenHermes-7B, pygmalion-2-7b, Llama-2-7b-chat-hf, BeingWell_llama2_7b, MetaMath-7B-V1.0, vicuna-7b-v1.5, Platypus2-7B, GOAT-7B-Community, Llama-2-7b-WikiChat-fused, dolphin-llama2-7b, MetaMath-Llemma-7B, CodeLlama-7b-Instruct-hf, Magicoder-S-CL-7B, CrystalChat
What Matters for Model Merging at Scale?2024ArxivPaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B)
Realistic Evaluation of Model Merging for Compositional Generalization2024Arxiv
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities2024ArxivLlama-3.1-8B, Mistral-7B-v0.3
Arcee's MergeKit: A Toolkit for Merging Large Language Models2024ArxivLlama2-7B-Chat, Meditron-7B

Advanced Methods

Model Merging

Pre-Merging Methods

Model Merging

Better Fine-tuning

Linearization Fine-tuning
Paper TitleYearConference/JournalRemark
Distilling Linearized Behavior into Non-Linear Fine-Tuning for Effective Task Arithmetic2026ICMLLlama-3.2-1B-Instruct
Understanding and Enforcing Weight Disentanglement in Task Arithmetic2026Arxiv
Tangent Space Fine-Tuning for Directional Preference Alignment in Large Language Models2026ArxivLlama-3.2-1B-Instruct
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature2026ICLR
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic2025ICLR
Tangent Transformers for Composition,Privacy and Removal2024ICLR
Parameter Efficient Multi-task Model Fusion with Partial Linearization2024ICLR
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models2023NeurIPS
Subspace Fine-tuning
Paper TitleYearConference/JournalRemark
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging2025ArxivLlama3-8B
Efficient Model Editing with Task-Localized Sparse Fine-tuning2025ICLR
Sharpness-aware Fine-tuning
Paper TitleYearConference/JournalRemark
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning2025ICLR
Others
Paper TitleYearConference/JournalRemark
Post-Hoc Merging is Not Enough: Many-Shot Model Merging with Loss-Gap Balancing2026ICMLGemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Qwen-3-4B
MergOPT: A Merge-Aware Optimizer for Robust Model Merging2026ICLRLlama3.1-8B-Instruct

Architecture Transformation

Paper TitleYearConference/JournalRemark
Model Assembly Learning with Heterogeneous Layer Weight Merging2025ICLR Workshop
Training-free Heterogeneous Model Merging2025Arxiv
Knowledge fusion of large language models2024ICLRLlama-2 7B, OpenLLaMA 7B, MPT 7B
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report2024ArxivNH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks2023ICASSP
GAN Cocktail: mixing GANs without dataset access2022ECCV

Weight Alignment

Paper TitleYearConference/JournalRemark
Transport and Merge: Cross-Architecture Merging for Large Language Models2026ArxivLLaMA-3 8B
Symmetry-Aware Graph Metanetwork Autoencoders: Model Merging through Parameter Canonicalization2025TAG-DS
Understanding Mode Connectivity via Parameter Space Symmetry2025ICML
Update Your Transformer to the Latest Release: Re-Basin of Task Vectors2025ICML
Model Assembly Learning with Heterogeneous Layer Weight Merging2025ICLR Workshop
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion2025Arxiv
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse2024Arxiv
Equivariant Deep Weight Space Alignment2024ICML
Harmony in diversity: Merging neural networks with canonical correlation analysis2024ICML
Transformer fusion with optimal transport2024ICLR
Layerwise linear mode connectivity2024ICLR
ZipIt! Merging Models from Different Tasks without Training2024ICLR
Proving linear mode connectivity of neural networks via optimal transport2024AISTATS
Training-Free Pretrained Model Merging2024CVPR
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering2024ArxivLlama2-7b, Llama2-13b
C2M3: Cycle-Consistent Multi Model Merging2024NeurIPS
PLeaS--Merging Models with Permutations and Least Squares2024Arxiv
Rethink Model Re-Basin and the Linear Mode Connectivity2024Arxiv
Git Re-Basin: Merging Models modulo Permutation Symmetries2023ICLR
Re-basin via implicit Sinkhorn differentiation2023CVPR
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks2023ICLR
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization2023ICLR
REPAIR: REnormalizing Permuted Activations for Interpolation Repair2023ICLR
Going beyond linear mode connectivity: The layerwise linear feature connectivity2023NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks2022ICLR
What can linear interpolation of neural network loss landscapes tell us?2022ICML
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling2021ICML
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes2021ICML
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances2021ICML
Linear Mode Connectivity and the Lottery Ticket Hypothesis2020ICML
Optimizing mode connectivity via neuron alignment2020NeurIPS
Model fusion via optimal transport2020NeurIPS
Uniform convergence may be unable to explain generalization in deep learning2019NeurIPS
Explaining landscape connectivity of low-cost solutions for multilayer nets2019NeurIPS
Essentially no barriers in neural network energy landscape2018ICML
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs2018NeurIPS

During Merging Methods

Model Merging

Basic Merging Methods

Paper TitleYearConference/JournalRemark
Composing parameter-efficient modules with arithmetic operation2023NeurIPS
Editing models with task arithmetic2023ICLR
Model fusion via optimal transport2020NeurIPS
Weight averaging for neural networks and local resampling schemes1996AAAI Workshop
Acceleration of stochastic approximation by averaging1992IAM Journal on Control and Optimization
Animating rotation with quaternion curves (Spherical Linear Interpolation (SLERP) Model Merging)1985SIGGRAPH Computer Graphics

Weighted-based Merging Methods

Paper TitleYearConference/JournalRemark
EvoGM: Learning to Merge LLMs via Evolutionary Generative Optimization2026ICMLQwen2.5-1.5B, Qwen3-8B
Label-Free Cross-Task LoRA Merging with Null-Space Compression2026ArxivLLAMA-3 8B, LLAVA-1.5-7B
The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging2026Arxiv
LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging2026Arxiv
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance2025ArxivxLAM-2-70b, CoALM-70B, watt-tool-70B, functionary-medium-70B, xLAM-2-8b, ToolACE-2-8B, watt-tool-8B, BitAgent-8B, CoALM-8B
Superpose Task-specific Features for Model Merging2025EMNLPLlama-2-7B
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis2025Arxiv
Weight Weaving: Parameter Pooling for Data-Free Model Merging2025Arxiv
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking2025ArxivMistral-7B, InternVL, Qwen2-VL
Variational Task Vector Composition2025NeurIPS
RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging2025Arxiv
StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation2025Arxiv
SeMe: Training-Free Language Model Merging via Semantic Alignment2025Arxiv
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging2025ArxivLLaMA2-13B, WizardLM-13B, WizardMath-13B, LLaVA-v1.5-13B, LLaVA-1.6-13B, Math-LLaVA
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs2025ICLRLlama-2-7B and Llama-2-13B
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge2025ArxivGemma-2-9B, Llama-3-8B
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models2025ArxivLLaMA-2 7B series, Mistral 7B series, LLaMA-2 13B series
RankMean: Module-Level Importance Score for Merging Fine-tuned Large Language Models2024ACL
Non-Uniform Parameter-Wise Model Merging2024Arxiv
How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging2024Arxiv
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging2024Arxiv
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation2024Arxivshisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling2024Arxiv
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic2024EMNLPLLaMA-2-7B, Mistral-7B, LLaMA-2-13B
Checkpoint Merging via Bayesian Optimization in LLM Pretraining2024ArxivBaichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B
Arcee’s MergeKit: A Toolkit for Merging Large Language Models2024ArxivLlama2-7B-Chat, Meditron-7B
Evolutionary optimization of model merging recipes2024Arxivshisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts2024ACL
AdaMerging: Adaptive Model Merging for Multi-Task Learning2024ICLR
Model Merging by Uncertainty-Based Gradient Matching2024ICLR
Merging by Matching Models in Task Subspaces2024TMLR
Fisher Mask Nodes for Language Model Merging2024LREC-COLING
Erasure Coded Neural Network Inference via Fisher Averaging2024ISIT
Dataless Knowledge Fusion by Merging Weights of Language Models2023ICLR
Merging models with fisher-weighted averaging2022NeurIPS

Subspace-based Merging Method (Sparse or Low-rank Subspace)

Paper TitleYearConference/JournalRemark
Essential Subspace Merging for Multi-Task Learning2026Arxiv
Sparsity Curse: Understanding RLVR Model Parameter Space from Model Merging2026KDDQwen-2.5-7B
PACT: Preserving Anchored Cores in Task-vectors for Model Merging2026Arxiv
Closed-Form Spectral Regularization for Multi-Task Model Merging2026ArxivInternVL2.5, Qwen2-VL
ResMerge: Residual-based Spectral Merging of Large Language Models2026ArxivQwen2.5-7B-Base, Qwen2.5-7B-SimpleRL-Zoo, Open-ReasonerZero-7B (Zero), General-Reasoner-Qwen2.5-7B(Reasoner)
Compress then Merge: From Multiple LoRAs into One Low-Rank Adapter2026ICMLLLaMA3-8B
TaDA: Calibrated Probe Gating for Task-Domain LoRA Merging2026ArxivLlama-2-7b-hf
Model Merging by Output-Space Projection2026ArxivLlama3.1-8B
Saliency-Aware Model Merging2026Arxiv
Model Merging on Loss Landscape: A Geometry Perspective2026Arxiv
PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging2026ArxivLLaVA1.5-7B
Towards Adaptive Continual Model Merging via Manifold-Aware Expert Evolution2026Arxiv
Evolutionary Negative Module Pruning for Better LoRA Merging2026Arxiv
Crowded in B-Space: Calibrating Shared Directions for LoRA Merging2026ArxivLlama-3.1-8B
Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score2026ArxivGemma-2 9B, Qwen2.5-7B, Phi-4-mini
DC-Merge: Improving Model Merging with Directional Consistency2026CVPRLLaVA
CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging2026ArxivQwen3-8B and Llama3.1-8B
Model Merging in the Essential Subspace2026Arxiv
Beyond Parameter Arithmetic: Sparse Complementary Fusion for Distribution-Aware Model Merging2026ArxivMistral-7B, Qwen2.5-14B, and Qwen2.5-32B
Orthogonal Model Merging2026ArxivLlama-3.1-8B, Qwen2.5-VL-7B-Instruct, Llama-3.2-3B
When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging2026Arxiv
Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations2026ArxivQwen2.5-7B, Qwen2.5-14B
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging2026ICLR
Decomposing Task Vectors for Refined Model Editing2025Arxiv
Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging2025ArxivQwen-14B
Towards Reversible Model Merging For Low-rank Weights2025Arxiv
Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging2025ArxivLLaMA-2-7B
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness2025NeurIPSLLaVA
Accurate and Efficient Low-Rank Model Merging in Core Space2025NeurIPS
Efficient Multi-Source Knowledge Transfer by Model Merging2025Arxiv
One Size Does Not Fit All: A Distribution-Aware Sparsification for More Precise Model Merging2025Arxiv
NegMerge: Sign-Consensual Weight Merging for Machine Unlearning2025ICML
Subspace-Boosted Model Merging2025Arxiv
Training-free LLM Merging for Multi-task Learning2025Arxiv
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data2025Arxiv
Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs2025ArxivMistral-7B, Llama3-8B
CALM: Consensus-Aware Localized Merging for Multi-Task Learning2025ICML
Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation2025ICML
Adaptive LoRA Merge with Parameter Pruning for Low-Resource Generation2025ACLLlama-3-8B-Instruct
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking2025ArxivLLaMA3.1-8B
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging2025Arxiv
LoRI: Reducing Cross-Task Interference in Multi-Task LowRank Adaptation2025ArxivLlama-3-8B and Mistral-7B
Task Vector Quantization for Memory-Efficient Model Merging2025Arxiv
Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms2025ArxivLlama-2-7b
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts2025ICLR 2025 Workshop
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach2025ICLR 2025 WorkshopGemma-9b, LLaMA 3.1 8b
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging2025ArxivMistral-7b-v0.1, WildMarcoroni-Variant1-7B and WestSeverus-7B-DPO-v2
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation2025Arxiv
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint2025ArxivLlama-3- 8B, Mistral-7B, and Llama2-13B
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation2025Arxiv
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging2025ArxivLlama-2-13b, WizardMath-13B-V1.0, WizardLM13B-V1.2, llama-2-13b-codealpaca
Superpose Singular Features for Model Merging2025ArxivLlama-2-7B
STAR: Spectral Truncation and Rescale for Model Merging2025NAACLMistral-7B-Instruct
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces2025Arxiv
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging2025NeurIPS
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent2025Arxiv
Revisiting Weight Averaging for Model Merging2024Arxiv
Task Singular Vectors: Reducing Task Interference in Model Merging2025CVPR
Less is More: Efficient Model Merging with Binary Task Switch2024Arxiv
FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts2024ArxivQwen-14B (LoRA), LLaMa2-13B, WizardLM-13B, WizardMath-13B, WizardCoderPython-13B
Beyond Task Vectors: Selective Task Arithmetic Based on Importance Metrics2024Arxiv
Parameter Competition Balancing for Model Merging2024NeurIPSLlama-2-7b
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch2024ICMLWizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Localizing Task Information for Improved Model Merging and Compression2024ICML
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging2024ICLR
Model merging with svd to tie the knots2024ArxivLlama3-8B
NegMerge: Consensual Weight Negation for Strong Machine Unlearning2024Arxiv
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic2024Arxiv
Activated Parameter Locating via Causal Intervention for Model Merging2024ArxivLlama-2-chat-7B
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning2024ArxivMistral-7B-v0.1, Llama-3-8B, Neurotic-7B, MoMo-70B
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling2024ArxivLlama-2-13b-code-alpaca, WizardLM, Wizard-Math, WizardCoder-Python
EMR-Merging: Tuning-Free High-Performance Model Merging2024NeurIPS
DPPA: Pruning Method for Large Language Model to Model Merging2024ArxivLLaMa 2
Model breadcrumbs: Scaling multi-task model merging with sparse masks2023Arxiv
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion2023Arxiv
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization2023ArxivLLaMA 7B, 13B, 33B, and 65B
Effective and ParameterEfficient Reusing Fine-Tuned Models2023Openreview
Resolving Interference When Merging Models2023NeurIPS
Task-Specific Skill Localization in Fine-tuned Language Model2023ICML

Routing-based Merging Methods (Dynamic Merging)

Paper TitleYearConference/JournalRemark
Dynamic Model Merging Made Slim2026ArxivLlama-3.1-8B-Instruct, Qwen-2.5-7B-Instruct
Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression2026Arxiv
TECS-L (Golden MoE): Dense-to-MoE Expert Splitting Framework2026GitHubMistral-7B
Fine-Grained Model Merging via Modular Expert Recombination2026Arxiv
MIN-Merging: Merge the Important Neurons for Model Merging2025Arxiv
SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging2025Arxiv
Adaptive Task Vectors for Large Language Models2025ArxivLLaMA3-8B and Mistral-7B
Dynamic Fisher-weighted Model Merging via Bayesian Optimization2025Arxiv
Data-Adaptive Weight-Ensembling for Multi-task Model Fusion2025IJCV
MASS: MoErging through Adaptive Subspace Selection2025Arxiv
Dynamic Model Merging with Mixture of Weights2025TCSVT
CAMEx: Curvature-aware Merging of Experts2025ICLR
1bit-Merging: Dynamic Quantized Merging for Large Language Models2025ArxivLLaMA-2 7B, Mistral 7B, and LLaMA-2 13B
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs2025Arxiv
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing2025ArxivQwen-2.5-7B, LLaMA-3.2-8B
Adapting Foundation Models via Training-free Dynamic Weight Interpolation2024NeurIPS 2024 Workshop
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging2024Arxiv
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation2024NeurIPS 2024 Workshop
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts2024ICML
Learning to Route Among Specialized Experts for Zero-Shot Generalization2024ICML
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy2024ICLR
Soft merging of experts with adaptive routing2024TMLR
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models2024ArxivMistral-7B-v0.1, MetaMath-Mistral-7B, dolphin-2.1-mistral-7b, speechless-code-mistral-7b-v1.0
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging2024NeurIPSQwen-14B
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts2024ArxivGemma-7B, LLaMA-2 7B & 13B, Mistral 7B, LLaMA-3 8B
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion2024Arxiv
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints2023ICLR

Post-calibration based Methods

Paper TitleYearConference/JournalRemark
FEATCAL: Feature Calibration for Post-Merging Models2026ArxivLlama-3.1-8B-Instruc
MAGIC: Achieving Superior Model Merging via Magnitude Calibration2025ArxivOLMo-3-7B
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration2025NeurIPS
Multi-Task Model Fusion via Adaptive Merging2025ICASSP
Representation Surgery in Model Merging with Probabilistic Modeling2025ICML
Parameter-Efficient Interventions for Enhanced Model Merging2024Arxiv
Tint Your Models Task-wise for Improved Multi-task Model Merging2024Arxiv
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery2024Arxiv
Representation Surgery for Multi-Task Model Merging2024ICML

Other Merging Methods

Paper TitleYearConference/JournalRemark
GFFMERGE: Efficient Merging of Graph Neural Force Fields and Beyond2026ICML
Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging2026ICML WorkshopQwen3-0.6B
Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning2026Arxiv
Black-Box Optimization of Mixed Binary-Continuous Variables: Challenges and Opportunities in Evolutionary Model Merging2026Arxiv
Bayesian Model Merging2026Arxiv
Generalizing the Geometry of Model Merging Through Frechet Averages2026ArxivLlama-3 8B
Differentially Private Model Merging2026Arxiv
Task Alignment: A simple and effective proxy for model merging in computer vision2026Arxiv
Model Merging via Data-Free Covariance Estimation2026Arxiv
Resolving Interference (RI): Disentangling Models for Improved Model Merging2026Arxiv
BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning2026Arxiv
ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation2026Arxiv
Training-Free Cross-Architecture Merging for Graph Neural Networks2026Arxiv
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models2026ICLRFlan-T5
Transporting Task Vectors across Different Architectures without Training2026ICML
MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging2026ArxivLlama3.1-8B, Llama-3.2-3B, Qwen3-0.6B, Qwen3-1.7B, and Qwen3-8B
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging2026ICLR
Sparsity-Aware Evolution for Model Merging2026Arxiv
AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse2026ArxivLlama2-7B-Chat, Llama2-7B-Code
Model Merging via Multi-Teacher Knowledge Distillation2025Arxiv
Bridging Training and Merging Through Momentum-Aware Optimization2025Arxiv
From Coefficients to Directions: Rethinking Model Merging with Directional Alignment2025Arxiv
Escaping Optimization Stagnation: Taking Steps Beyond Task Arithmetic via Difference Vectors2025Arxiv
Model Merging with Functional Dual Anchors2025Arxiv
Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories2025Arxiv
Rethinking Layer-wise Model Merging through Chain of Merges2025ArxivLlama 3-8B
Competition and Attraction Improve Model Fusion2025ArxivWizardMath 7B v1.0, AgentEvol 7B
PSO-Merging: Merging Models Based on Particle Swarm Optimization2025ArxivLlama-3-8B, Llama-2-13B, and Mistral-7B-v0.3
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging2025Arxiv
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging2025Arxiv
Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment2025Arxiv
Reinforced Model Merging2025Arxiv
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization2025ArxivLLaMA2-7B
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors2025ArxivWizardLM-13B (LM), WizardMath-13B (Math), and llama-2-13bcodealpaca (Code)
GNNMERGE: Merging of GNN Models Without Accessing Training Data2025Arxiv
MERGE3: Efficient Evolutionary Merging on Consumer-grade GPUs2025ICMLMistral-7B
Activation-Informed Merging of Large Language Models2025ArxivLlama-2-13b, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
Scalable Model Merging with Progressive Layer-wise Distillation2025ArxivWizardLM-13B, WizardMath-13B and llama-2-13b-code-alpaca
Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging2025ArxivLlama-2-13, WizardLM13B, WizardMath-13, llama-2-13b-code-alpaca
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts2025ICLR
Fine-tuning Aligned Classifiers for Merging Outputs: Towards a Superior Evaluation Protocol in Model Merging2024Arxiv
Multi-Task Model Merging via Adaptive Weight Disentanglement2024Arxiv
Rethinking Weight-Averaged Model-merging2024Arxiv
ATM: Improving Model Merging by Alternating Tuning and Merging2024Arxiv
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models2024ArxivLlama-2-7B-Chat, WizardMath-7B, CodeLlama-7B
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging2024Arxiv
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization2024ArxivQwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
Toward Data Efficient Model Merging between Different Datasets without Performance Degradation2024JMLR
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling2023ArxivSOLAR 10.7B, SOLAR 10.7B-Instruct

Theories or Analysis of Model Merging

Paper TitleYearConference/JournalRemark
An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse2026ArxivQwen2.5-3B, 7B, and 14B, Llama3.1-8B
Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts2026Arxiv
Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs2026ICLRLlama-3.2-3B, Llama-3.1-8B, and Mistral-Small-3-24B
M-Loss: Quantifying Model Merging Compatibility with Limited Unlabeled Data2026Arxiv
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training2026ICLRLing-mini-16B
Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success2026Arxiv
Understanding Model Merging: A Unified Generalization Framework for Heterogeneous Experts2026Arxiv
Will it Merge? On The Causes of Model Mergeability2026ArxivLlama-3.2-3B、Qwen-2.5-3B、Mistral-7B-Instruct-v0.2
How does the optimizer implicitly bias the model merging loss landscape?2025Arxiv
On Task Vectors and Gradients2025Arxiv
Why Do More Experts Fail? A Theoretical Analysis of Model Merging2025Arxiv
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers2025ICLR
Multi-Level Collaboration in Model Merging2025Arxiv
Low-rank bias, weight decay, and model merging in neural networks2025Arxiv
Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression2025Arxiv
SeWA: Selective Weight Average via Probabilistic Masking2025Arxiv
Efficient Model Editing with Task Vector Bases: A Theoretical Framework and Scalable Approach2025Arxiv
Task Arithmetic Through The Lens Of One-Shot Federated Learning2024ArxivWizardLM-13B, WizardMath-13B, Llama-2-13B-Code-Alpaca, Llama2-13B
A Unified Analysis for Finite Weight Averaging2024Arxiv
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average2024Arxiv
On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm2024ICML
Generalization Analysis of Stochastic Weight Averaging with General Sampling2024ICML
Diverse weight averaging for out-of-distribution generalization2022NeurIPS
Ensemble of averages: Improving model selection and boosting performance in domain generalization2022NeurIPS
Stability analysis and generalization bounds of adversarial training2022NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks2022ICLR
Swad: Domain generalization by seeking flat minima2021NeurIPS
Linear Mode Connectivity and the Lottery Ticket Hypothesis2020ICML
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes2020ICLR
Optimizing mode connectivity via neuron alignment2020NeurIPS
Uniform convergence may be unable to explain generalization in deep learning2019NeurIPS
Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification2018JMLR
Iterate averaging as regularization for stochastic gradient descent2018Arxiv
Essentially no barriers in neural network energy landscape2018ICML
Averaging weights leads to wider optima and better generalization2018UAI
Train faster, generalize better: Stability of stochastic gradient descent2016ICML

Application of Model Merging in Foundation Models

Model Merging

Model Merging in Large Language Models

Model Merging

Human Preference Alignment for LLMs

Paper TitleYearConference/JournalRemark
From “Weak” Signals to Strong Models: Preference Delta Aggregation with LoRA Merging2026ArxivQwen3-8B and Tülu3-8B
TPMM-DPO: Trajectory-aware Preference-guided Model Merging for Iterative Direct Preference Optimization2026ArxivLlama3.2-3B
Navigating the Alignment-Calibration Trade-off: A Pareto-Superior Frontier via Model Merging2025ArxivGemma-3-12B, Gemma-3-27B, Qwen2.5-7B
BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation2025ArxivQwen-2.5-7B-Instruct, Llama-3.1-8B-Instruct
Personality Vector: Modulating Personality of Large Language Models by Model Merging2025EMNLPLlama-3.1-8B-Instruct, Qwen2.5-7B-Instruct
SafeMERGE: Preserving Safety Alignment in Fine-Tuned LLMs via Selective Layer-Wise Model Merging2025ArxivLlama-2-7B-Chat, Qwen-2-7B-Instruct
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation2025ArxivLLaMA-2 7B
Model soup for better rlhf: Weight space averaging to improve alignment in llms2024NeurIPS 2024 WorkshopLlama2-7B, Mistral-7B, Gemma-2B
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging2024ArxivLlama-3-8B-Instruct
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation2024Arxiv
H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs2024ArxivLLaMA-2 7B
Baichuan Alignment Technical Report2024ArxivQwen2-Nova-72B, Llama3-PBM-Nova-70B
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning2024Arxiv
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging2024ArxivMetaMath-7B, MAmmoTH-7B, LLaMA2-7B
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning2024ArxivMistral-7B-v0.1, Llama-3-8B
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch2024ArxivMistral-0.2-7B-Instruct, LLaMA-3-8B-Instruct, OpenBioLLM-8B, MAmmoTH2-7B, WizardMath-1.1-7B
Towards Comprehensive Post Safety Alignment of Large Language Models via Safety Patching2024ArxivLLaMA-2-7B-Chat, LLaMA-3-8B-Instruct, Mistral7B-Instruct-v0.1 and Gemma1.1-7B-it
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction2024ArxivLlama-2-7b
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment2024ArxivQwen1.5-7B, LLaMa3-8B
A safety realignment framework via subspace-oriented model fusion for large language models2024ArxivWizardLM-7B
Weak-to-strong extrapolation expedites alignment2024Arxivzephyr-7b, starling-7b, snorkel-7b, llama3-8b, internlm2-7b, internlm2-20b, tulu-2-dpo-7b, tulu-2-dpo-13b, tulu-2-dpo-70b
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic2024ArxivLlama-2-7BChat
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards2023NeurIPSLLaMA-7b
Personalized soups: Personalized large language model alignment via post-hoc parameter merging2023ArxivTulu-7B LM

Detoxification of LLMs

Paper TitleYearConference/JournalRemark
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation2025ICLRGEMMA-7B-IT, LLAMA2-7B/13B/70B-CHAT, LLAMA3-8B-INST
3DM: Distill, Dynamic Drop, and Merge for Debiasing Multi-modal Large Language Models2025ACLLLaVA-1.5-7b, InternVL-2.5-8b, LLaVA-1.5-7b and ChatGLM4-9b
Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation2025ArxivLLAMA3-8B-Instruct, Mistral-7B-Instruct-v0.2
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach2024Arxiv
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation2024AAAILLaMA-7B
Mitigating Social Biases in Language Models through Unlearning2024ArxivLLaMA-2 7B
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models2024ArxivLlama-2-7B, Llama-2-chat-7B, Vicuna-7B, Llama-2-13B
Composing Parameter-Efficient Modules with Arithmetic Operation2023NeurIPS
Editing models with task arithmetic2023ICLR
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation2023Arxiv

Knowledge Editing/Unlearning of LLMs

Paper TitleYearConference/JournalRemark
Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey2026ArxivQwen2.5-7B
Per-parameter Task Arithmetic for Unlearning in Large Language Models2026ArxivLlama3.2 1B Instruct
Model Merging for Knowledge Editing2025ACLQwen2.5-7B-Instruct
Exact Unlearning of Finetuning Data via Model Merging at Scale2025Arxiv
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging2025ArxivOLMo-7B-0724-Instruct
Exact Unlearning of Finetuning Data via Model Merging at Scale2025ICLR 2025 Workshop MCDC
NegMerge: Consensual Weight Negation for Strong Machine Unlearning2024Arxiv
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs2024ArxivZEPHYR-7B-BETA, LLAMA2-7B
Towards Safer Large Language Models through Machine Unlearning2024ACLLLAMA2-7B, LLAMA2-13B
Editing models with task arithmetic2023ICLR
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model2023ArxivLLAMA2-7B, LLAMA-7B, BLOOM-7B
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion2023Arxiv

Faster Training of LLMs

Paper TitleYearConference/JournalRemark
Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training2026ICMLLLaMA-2B
Mashup Learning: Faster Finetuning by Remixing Past Checkpoints2026Arxiv
GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training2025ArxivQwen2.5-VL-7B
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging2025ICML
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging2025Arxiv
Merge to Mix: Mixing Datasets via Model Merging2025ArxivLlama-3-8B-Instruct
Model Merging in Pre-training of Large Language Models2025ArxivSeed-MoE-1.3B/13B, SeedMoE-10B/100B, Seed-MoE-15B/150B
Parameter-Efficient Checkpoint Merging via Metrics-Weighted Averaging2025Arxiv
DEM: Distribution Edited Model for Training with Mixed Data Distributions2024ArxivOpenLLaMA 7B and 13B
Checkpoint Merging via Bayesian Optimization in LLM Pretraining2024ArxivBaichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning2023ACL
Early Weight Averaging meets High Learning Rates for LLM Pre-training2023NeurIPS Workshop
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging2022NeurIPS Workshop
Fusing finetuned models for better pretraining2022Arxiv

Faster Reasoning of LLMs

Paper TitleYearConference/JournalRemark
Multi-objective Evolutionary Merging Enables Efficient Reasoning Models2026ArxivDeepSeek-R1-Distill-Qwen 1.5B, 7B, and 14B
Data-Free Layer-Adaptive Merging via Fisher Information for Long-to-Short Reasoning LLMs2026ArxivQwen2.5-Math-7B,DeepSeek-R1-Distill-Qwen-7B
RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format2026ICLRQwen2.5-1.5B/14B/32B, and Llama-3.1-8B
Reasoning Pattern Alignment Merging for Adaptive Reasoning2026Arxiv(i) Qwen3-4B-Thinking (Long-CoT) and Qwen3-4B-Instruct (Short-CoT); (ii) DeepSeekR1-Distill-Qwen-1.5B (Long-CoT) and Qwen2.5- Math-1.5B (Short-CoT)
Revisiting Model Interpolation for Efficient Reasoning2025ArxivQwen3-4B
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging2025ArxivQwen2.5-32B, DeepSeek-R1-32B
Kimi k1.5: Scaling Reinforcement Learning with LLMs2025ArxivKimi k1.5

Improving Computational Efficiency of MoE-based LLM

Paper TitleYearConference/JournalRemark
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training2026ArxivQwen3-Next-80A3B
REAM: Merging Improves Pruning of Experts in LLMs2026ArxivQwen3-30B-A3B-Instruct-2507, Qwen3-Coder-Next, GLM-4.5-Air
Upcycled and Merged MoE Reward Model for Mitigating Reward Hacking2025Arxiv
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference2025ArxivMixtral-8x7B, Deepseek-MoE
Enhanced Expert Merging for Mixture-of-Experts in Graph Foundation Models2025ArxivLLaMA-3.1-8B
Expert Merging in Sparse Mixture of Experts with Nash Bargaining2025ArxivQwen1.5-MoE-14B, DeepSeek-MoE-16B
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging2025ArxivDeepSeekMoE, Qwen1.5-MoE-A2.7B, and Qwen3-30B-A3B
Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference2025Arxiv
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging2025ArxivMixtral 8x7B, Qwen3- 235B-A22B, Qwen1.5-MoE-A2.7B, and DeepSeekMoE-16B-Base
On Linear Mode Connectivity of Mixture-of-Experts Architectures2025NeurIPS
Merge, then compress: Demystify efficient SMoe with hints from its routing policy2024ICLRfairseq-moe15b SMoE
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts2023EMNLP

Mixing Datasets via Model Merging

Paper TitleYearConference/JournalRemark
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
OPTIMER: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training2026ArxivGemma 3 27B
Linear Model Merging Unlocks Simple and Scalable Multimodal Data Mixture Optimization2026ArxivQwen2-VL-2B and Intern3.5-VL-2B
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training2026ArxivQwen3-1.7B
Multi-task Code LLMs: Data Mix or Model Merge?2026ArxivQwen Coder 2.5 7B, DeepSeek 7B
MergeMix: Optimizing Mid-Training Data Mixtures via Learnable Model Merging2026Arxiv8B and 16B MoE
Merge to Mix: Mixing Datasets via Model Merging2025ArxivLlama-3-8B-Instruct

LLM Agent Merging

Paper TitleYearConference/JournalRemark
Behavior Knowledge Merge in Reinforced Agentic Models2026ArxivRL-trained agentic models
ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging2026ArxivSimia-Tau-SFT-Qwen3-8B, SimiaOfficeBench-SFT-Qwen3-8B, and Simia-AgentBench-SFT-Qwen3-8B
Divide, Optimize, Merge: Scalable Fine-Grained Generative Optimization for LLM Agents2025EMNLPo3-mini
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents)2024NeurIPSLlama3.1-8B
Agent Skill Acquisition for Large Language Models via CycleQD2024ArxivLlama3-8B-Instruct

Combine the Capabilities of Expert LLMs

Paper TitleYearConference/JournalRemark
Enhancing Multilingual Reasoning via Steerable Model Merging2026Arxiv
When Model Merging Breaks Routing: Training-Free Calibration for MoE2026ArxivOLMoE-1B-7B-0125
On the Limits of Model Merging for Multilinguality in Pre-Training2026ArxivHPLT 2.15B
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts2026ArxivOLMo 2 7B
Merge and Conquer: Instructing Multilingual Models by Adding Target Language Weights2026ArxivLlama 3.1 8B, Qwen3 8B, Qwen3 14B
Preference-Aligned LoRA Merging: Preserving Subspace Coverage and Addressing Directional Anisotropy2026ArxivLLaMA-3-8B
Label-Free Cross-Task LoRA Merging with Null-Space Compression2026ArxivLLAMA-3 8B, LLAVA-1.5-7B
AdaLTM: Adaptive Layer-wise Task Vector Merging for Categorical Speech Emotion Recognition with ASR Knowledge Integration2026Arxiv
Functionality-Oriented LLM Merging on the Fisher–Rao Manifold2026ArxivQwen2.5-14B, Qwen2.5-14B-Instruct-1M, Qwen2.5-Coder-14B-Instruct, DeepSeek-R1-Distill-Qwen-14B, OpenReasoning-Nemotron-14B
The Appeal and Reality of Recycling LoRAs with Adaptive Merging2026ArxivLlama3.1 8B-Instruct
LS-Merge: Merging Language Models in Latent Space2026ICLRGemma-3-1B-it, Gemma-3-4B-it, Llama-3-1B-instruct, Llama-2-7b
Bagging-Based Model Merging for Robust General Text Embeddings2026ArxivQwen3-4B
Data-driven Clustering and Merging of Adapters for On-device Large Language Models2026ArxivLlama 3.2 3B, Qwen 2.5 1.5B and StableLM 2 1.6B
Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging2026ArxivLlama-3.1-8b-Instruct
SimMerge: Learning to Select Merge Operators from Similarity Signals2026Arxiv7B to 111B
Multi-Stage Evolutionary Model Merging with Meta Data Driven Curriculum Learning for Sentiment-Specialized Large Language Modeling2026Arxiv
ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging2026ArxivQwQ-32B-Preview, Meditron3-Qwen2.5-7B and MMed-Llama3-8B, WiroAIFinance-Qwen-7B and WiroAI-Finance-Llama8B
Reliable Cultural Knowledge Preservation in Multilingual LLMs through Model Merging2025ArxivQwen-2.5-3B
AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints2025ArxivLLaMA-3 8B, Mistral 7B, Qwen 2, Phi-3.5, Gemma 2
Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation2025Arxiv
Adapting Chat Language Models Using Only Target Unlabeled Language Data2025TMLRQwen2.5 7B, Llama 3.1 8B, Qwen3 14B
RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior2026AAAIQwen2.5-7B, Llama3.1-8B
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance2025ArxivxLAM-2-70b, CoALM-70B, watt-tool-70B, functionary-medium-70B, xLAM-2-8b, ToolACE-2-8B, watt-tool-8B, BitAgent-8B, CoALM-8B
SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation2025Arxiv
Merging Continual Pretraining Models for Domain-Specialized LLMs: A Case Study in Finance2025ArxivLlama-3-8B, Llama-2-7B
Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models2025EMNLPLLaMA-3 8B
Bridging Dialectal Gaps in Arabic Medical LLMs through Model Merging2025arabicnlp
Adapting Multilingual Models to Code-Mixed Tasks via Model Merging2025Arxiv
Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation2025ArxivLlama-3.1-8B-Instruct and Gemma-3-12B-Instruct
ABC: Towards a Universal Code Styler through Model Merging2025ACM on Programming LanguagesQwen2.5-Coder, Deepseek-Coder
Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese2025Arxiv
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking2025ArxivMistral-7B, InternVL, Qwen2-VL
The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging2025ArxivQwen3-30B-A3B-Thinking-2507, Qwen3-30B-A3B-Instruct-2507
MLM: Multi-linguistic LoRA Merging 2025NeurIPS WorkShopLLaMA-3.2 (1B and 3B)
Model Merging Scaling Laws in Large Language Models2025ArxivQwen2.5 0.5, 1.5, 3, 7, 14, 32, 72B
Harnessing Optimization Dynamics for Curvature-Informed Model Merging2025ArxivLlama-3.1-8B
Kwai Keye-VL 1.5 Technical Report2025ArxivKeye-VL-8B
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic2025ArxivQWEN2.5-7B
Surrogate Benchmarks for Model Merging Optimization2025ArxivEvoLLM-JP-v1-7B, shisa-gamma-7b-v1
Tensorized Clustered LoRA Merging for Multi-Task Interference2025ArxivMistral-7B
Efficient Compositional Multi-tasking for On-device Large Language Models2025ArxivLlama 3.1 70B
HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging2025Arxiv
Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts2025Arxiv
Merging Large Language Models for Enhanced Code Generation: A Comparative Study of Model Merging Techniques Across Programming Languages2025Open Access in DiVACodeQwen1.5-7B, DeepSeek-Coder-6.7b-Base, CodeLlama-34B
On Fairness of Task Arithmetic: The Role of Task Vectors2025ArxivLLaMA2-7B
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs2025ArxivFALCON 3 7B, QWEN2.5 7B Instruct, LLAMA 3.1 8B Instruct, AYA Expanse 8B
Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning2025ArxivMetaMath-Mistral-7B, Dolphin-2.1-Mistral-7B and Speechless-Code-Mistral-7Bv1.0
Training-free LLM Merging for Multi-task Learning2025ACLEchelon-AI/Med-Qwen2-7B, shtdbb/qwen2-7b-med, Qwen2-Instruct
ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost2025ArxivLlama3-inst-70B, Llama3-base-70B, Llama3.1-base-70B
Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models2025ArxivQwen2.5-7B, Qwen2.5-32B
Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing2025Arxiv
Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe2025ArxivTyphoon2 R1 70B, Deepseek R1 70B
Efficient Model Development through Fine-tuning Transfer2025ArxivLlama 3.1 8B
Command A: An Enterprise-Ready Large Language Model2025ArxivCommand R7B
Extrapolation Merging: Keep Improving With Extrapolation and Merging2025ArxivQwen2-7B, Meta-Llama-3-8B, Mistral-Nemo-Base-2407-12B, Qwen1.5-14B
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond2025ArxivLight-R1-32B
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion2025ArxivGemma-2-27B-it, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct
Superficial Self-Improved Reasoners Benefit from Model Merging2025ArxivLlama2-7B
Nature-Inspired Population-Based Evolution of Large Language Models2025Arxiv
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge2025ArxivGemma-2-9B, Llama-3-8B
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation2025ArxivWizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging2025ArxivNuminaMath-7B, DeepSeek-Math-7B-Base, LLaMA-series models, WizardMath-13B
Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition2025ArxivContactDoctor-8B
Transferring Textual Preferences to Vision-Language Understanding through Model Merging2025ArxivLlama-3.2-11B-Vision -Instruct, Llama-3.1-Tulu-2-8B-uf-mean-rm, Llama-3.1-Tulu-3-8B-RM
Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging2025ArxivLlama-2-13b, WizardMath-13B-V1.0, WizardLM13B-V1.2, llama-2-13b-codealpaca
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging2025ArxivTyphoon2 70B Instruct, DeepSeek R1 70B Distill, Llama 3.1 70B, Llama 3.3 70B
Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging2025ArxivWizardLM-13B, WizardMath-13B, and llama-2-13b-code-alpaca
Skill Expansion and Composition in Parameter Space2025Arxiv
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion2025ArxivQwen2.5-Coder-14B-Instruct, Qwen2.5-14B-Instruct, and Mistral-Small-24B-Instruct-2501
Channel Merging: Preserving Specialization for Merged Experts2025AAAIDolphin-2.2.1-Mistral-7B, Speechless-Code-Mistral-7B, MetaMathMistral-7B, Chinese-Mistral-7BInstruct-v0.1
Weighted-reward preference optimization for implicit model fusion2025ICLRLLaMA3-8B-Instruct
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion2024ArxivMiniGemini-8B and SLIME-8B
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents2024ArxivLlama3.1-8B
JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging2024ArxivBunny-v1_1-Llama-3-8B-V, MMed-Llama-3-8B-EnIns, OpenBioLLM-Llama3-8B, Llama-3-Swallow-8B-Instruct-v0.1
If You Can’t Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs2024ArxivCommand R+ 104B
Agent Skill Acquisition for Large Language Models via CycleQD2024ArxivLlama3-8B-Instruct
Collaboratively adding new knowledge to an LLM2024ArxivMeta-Llama-3-8B
Unconstrained Model Merging for Enhanced LLM Reasoning2024ArxivCodeLlama-7B-Ins, CodeLlama-70B-Ins, Deepseek-Coder-Ins-v1.5, Qwen2.5-Math-7B-Ins, WizardMath-7B-V1.1, OpenMath-Mistral 7B, MetaMath-7B, MetaMath-70B
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks2024ArxivLlama-7b, Llama2-7b-chat
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging2024ArxivLlama 2 7B
Exploring Model Kinship for Merging Large Language Models2024ArxivMistral-7B, Mistral-7b-instruct-v0.2, MetaMath-mistral-7b, Open-chat-3.5-1210
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation2024Arxivshisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models2024ArxivLLAMA 3.1 8B
What Matters for Model Merging at Scale?2024ArxivPaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B)
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models2024ArxivLlama-2-7B-Chat, WizardMath-7B, CodeLlama-7B
FUSECHAT: Knowledge Fusion of Chat Models2024ArxivOpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging2024ArxivCodeLlama 7B
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization2024ArxivQwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
Knowledge Fusion By Evolving Weights of Language Models2024ACL
LLM Merging: Building LLMs Efficiently through Merging2024NeurIPS 2024 Competition TrackLLaMA-7B, Mistral-7B, Gemma-7B
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement2024ArxivQwen1.5-7B, Qwen1.5-Chat-7B, Sailor-7B, Qwen1.5-14B, Qwen1.5-Chat-14B, Sailor-14B, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization2024ArxivQwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic2024ArxivLLaMA-2-7B, Mistral-7B, LLaMA-2-13B
PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models2024ArxivMistral-Instruct-7B, Mixtral-Instruct-8x7B
Knowledge fusion of large language models2024ICLRLlama-2 7B, OpenLLaMA 7B, MPT 7B
Language models are super mario: Absorbing abilities from homologous models as a free lunch2024ICMLWizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Controlled Text Generation via Language Model Arithmetic2024ICMLMPT-7B, Pythia-12B, Llama-2-Chat-13B
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models2024ArxivLlaMA2-13B and LlaMA3-8B (LoRA)
Evolutionary optimization of model merging recipes2024Arxivshisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM2024ArxivLlama-2-7B
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report2024ArxivNH2-Mixtral-8x7B, NH2-Solar-10.7B, OpenChat-3.5-7B

Note: The following papers are from: LLM Merging Competition at NeurIPS 2024

Paper TitleYearConference/JournalModels
Llm merging: Building llms efficiently through merging2024LLM Merging Competition at NeurIPS-
Towards an approach combining Knowledge Graphs and Prompt Engineering for Merging Large Language Models2024LLM Merging Competition at NeurIPSmeta-llama/Llama-2-7b; microsoft_phi1/2/3
Model Merging using Geometric Median of Task Vectors2024LLM Merging Competition at NeurIPSflan_t5_xl
Interpolated Layer-Wise Merging for NeurIPS 2024 LLM Merging Competition2024LLM Merging Competition at NeurIPSsuzume-llama-3-8B-multilingual-orpo-borda-top75, Barcenas-Llama3-8bORPO, Llama-3-8B-Ultra-Instruct-SaltSprinkle, MAmmoTH2-8B-Plus, Daredevil-8B
A Model Merging Method2024LLM Merging Competition at NeurIPS-
Differentiable DARE-TIES for NeurIPS 2024 LLM Merging Competition2024LLM Merging Competition at NeurIPSsuzume-llama-3-8B-multilingualorpo-borda-top75, MAmmoTH2-8B-Plus and Llama-3-Refueled
LLM Merging Competition Technical Report: Efficient Model Merging with Strategic Model Selection, Merging, and Hyperparameter Optimization2024LLM Merging Competition at NeurIPSMaziyarPanahi/Llama3-8B-Instruct-v0.8, MaziyarPanahi/Llama-3-8B-Instruct-v0.9, shenzhiwang/Llama3-8B-Chinese-Chat, lightblue/suzume-llama-3-8B-multilingual
Simple Llama Merge: What Kind of LLM Do We Need?2024LLM Merging Competition at NeurIPSHermes-2-Pro-Llama-3-8B, and Daredevil-8B
LLM Merging Competition Technical Report for NeurIPS 2024: Efficiently Building Large Language Models through Merging2024LLM Merging Competition at NeurIPSMistral-7B-Instruct94 v2, Llama3-8B-Instruct, Flan-T5-large, Gemma-7B-Instruct, and WizardLM-2-7B
MoD: A Distribution-Based Approach for Merging Large Language Models2024LLM Merging Competition at NeurIPSQwen2.5-1.5B and Qwen2.5-7B

Model Merging in Multimodal Large Language Models

Model Merging

Model Merging for Multimodal Fusion

Paper TitleYearConference/JournalRemark
Jointly training large autoregressive multimodal models2024ICLR
Model Composition for Multimodal Large Language Models2024ACLVicuna-7B-v1.5
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation2023ICML
An Empirical Study of Multimodal Model Merging2023EMNLP
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks2023TMLR

Model Merging for Cross-Modal Knowledge Transfer

Paper TitleYearConference/JournalRemark
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification2024ICASSP Workshop

Combine the Capabilities of Expert MLLMs

Paper TitleYearConference/JournalRemark
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging2026ICMLQwen2.5-VL-3B
PivotMerge: Bridging Heterogeneous Multimodal Pre-training via Post-Alignment Model Merging2026ArxivLLaVA1.5-7B
Reasoning Resides in Layers: Restoring Temporal Reasoning in Video-Language Models with Layer-Selective Merging2026ArxivLongVA-7B, InternVL3-8B, Qwen3-VL-4B
One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging2026ArxivQwen-2.5-3B-Instruct
Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging2026ICLRLLaVA-1.5-7B, OpenFlamingo-9B
SSAM: Singular Subspace Alignment for Merging Multimodal Large Language Models2026Arxiv
ES-Merging: Biological MLLM Merging via Embedding Space Signals2026Arxiv
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models2026ICLRVisCodex-8B, VisCodex-33B
FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision–Language Models2026ArxivQwen2.5-VL-7B-Instruct, DeepSeekR1-Distill-Qwen-7B, Qwen2.5-VL-32B-Instruct, QwQ-32B
PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs2026ArxivLLaVA-v1.5-7B, Qwen2.5-VL-7B-Instruct, Qwen3-VL-8B-Instruct
Where and What Matters: Sensitivity-Aware Task Vectors for Many-Shot Multimodal In-Context Learning2026AAAIQwen-VL-7B, Idefics2-8B
MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent2025ArxivQwen2.5-0.5B
Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging2025Arxiv
Model Merging to Maintain Language-Only Performance in Developmentally Plausible Multimodal Models2025Arxiv
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking2025ArxivMistral-7B, InternVL, Qwen2-VL
UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging2025ACLLLaVA-v1.5-7B
Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs2025ArxivQwen2-VL-2B
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging2025ArxivQwen2-VL-7B-Base, Vicuna-7B-v1.5
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging2025ICMLLLaVA-NeXT-8B, Idefics2-8B, InternVL2-76B
REMEDY: Recipe Merging Dynamics in Large Vision-Language Models2025ICLRLLaVA-1.5 (Vicuna-7B)
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness2025NeurIPSLLaVA-v1.5-7B
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation2025ArxivLLaVA
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization2025ArxivLLaVA-OneVision-7B, Qwen2-VL-7B, LLaVA-v1.5-7B, CogVLM-chat-7B
Transferring Textual Preferences to Vision-Language Understanding through Model Merging2025ArxivLlama-3.2-11B-Vision-Instruct, Llama-3.1-Tulu-2-8B-uf-meanrm, Llama-3.1-Tulu-3- 8B-RM, Llama-3.1-8B

Model Merging in Image Generative Models

Model Merging

Style Mixing in Generative Models

Paper TitleYearConference/JournalRemark
SSR-Merge: Subspace Signal Routing for Training-Free LoRA Merging in Diffusion Models2026ICML
DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation2026ArxivStable Diffusion v1.5, FLUX.1 Dev
GimmBO: Interactive Generative Image Model Merging via Bayesian Optimization2026Arxiv
Rethinking Inter-LoRA Orthogonality in Adapter Merging: Insights from Orthogonal Monte Carlo Dropout2025Arxiv
BlockLoRA: Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation2025Arxiv
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation2024ArxivLLaVA-Critic 7b
IterIS: Iterative Inference-Solving Alignment for LoRA Merging2024Arxiv
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models2024ECCV
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models2024Arxiv
MoLE: Mixture of LoRA Experts2024ICLR
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models2024Arxiv
Multi-LoRA Composition for Image Generation2024Arxiv
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models2023NeurIPS
Merging loras2023(github)
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs2023Arxiv
GAN Cocktail: mixing GANs without dataset access2022ECCV

Reducing Training Cost of Generative Models

Paper TitleYearConference/JournalRemark
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better2024Arxiv
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA2024Arxiv

Enhancing the Faithfulness (or Generation Quality) of Diffusion Models

Paper TitleYearConference/JournalRemark
Decouple-Then-Merge: Towards Better Training for Diffusion Models2024Arxiv
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data2024Arxiv

Deepfake Detection

Paper TitleYearConference/JournalRemark
Real-Aware Residual Model Merging for Deepfake Detection2025Arxiv

Model Merging in Video Generative Models

Enhancing Motion Modeling

Paper TitleYearConference/JournalRemark
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think2025CVPRDynamicrafter,SVD

Application of Model Merging in Different Machine Learning Subfields

Model Merging

Model Merging in Continual Learning

Model Merging to Mitigate Catastrophic Forgetting

Paper TitleYearConference/JournalRemark
Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning2026Arxiv
Unlocking the Potential of Continual Model Merging: An ODE Perspective2026ICML
ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging2026Arxiv
Revitalizing the Beginning: Avoiding Storage Dependency for Model Merging in Continual Learning2026Arxiv
Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain?2026ArxivQwen2.5-7B-Instruct and Mistral-7BInstruct, Mistral-Small-24B-Instruct
MAny: Merge Anything for Multimodal Continual Instruction Tuning2026ArxivLLaVA-1.5-7B and InternVL-Chat7B
BidirLM: From Text to Omnimodal Bidirectional Encoders by Adapting and Composing Causal LLMs2026ArxivQwen3-1.7B and Qwen3-0.6B
Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging2026ArxivLlama-3.1-8B-Base
Mapping Post-Training Forgetting in Language Models at Scale2026ICLR
LCA: Local Classifier Alignment for Continual Learning2026ICLR
MERGETUNE: Continued fine-tuning of vision-language models2026Arxiv
Merge before Forget: A Single LoRA Continual Learning via Continual Merging2025ArxivLlama-2-7B-chat, Llama-2-13B-chat, Qwen2.5-7B
Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging2025Arxiv
Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport2025Arxiv
MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images2025Arxiv
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging2025ArxivQwen2-7B-Instruct, Llama-2-7B-chat
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection2025NeurIPS
K-Merge: Online Continual Merging of Adapters for On-device Large Language Models2025Arxiv
Toward a Holistic Approach to Continual Model Merging2025Arxiv
Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity2026ICLR
AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning2025EMNLPLLaMA2-7B, LLaMA2-13B
HAM: Hierarchical Adapter Merging for Scalable Continual Learning2025Arxiv
Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning2025ICMLLLaVA-1.5-7B
DuET: Dual Incremental Object Detection via Exemplar-Free Task Arithmetic2025Arxiv
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning2025ICCV
Forgetting of task-specific knowledge in model merging-based continual learning2025Arxiv
Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition2025Arxiv
RegCL: Continual Adaptation of Segment Anything Model via Model Merging2025Arxiv
Continual Learning in Vision-Language Models via Aligned Model Merging2025Arxiv
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning2025Arxiv
MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging2025NeurIPS
Analysis of Model Merging Methods for Continual Updating of Foundation Models in Distributed Data Settings2025ArxivApplied Sciences
BECAME: BayEsian Continual Learning with Adaptive Model MErging2025Arxiv
Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs2025ArxivLlama-3-8B-Instruct
Cost-Efficient Continual Learning with Sufficient Exemplar Memory2025Arxiv
Continual Model Merging without Data: Dual Projections for Balancing Stability and Plasticity2025NeurIPS
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging2025NeurIPS
Soup to go: mitigating forgetting during continual learning with model averaging2025ArxivLlama 2 (7B)
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning2024Arxiv
Parameter Averaging is All You Need to Prevent Forgetting2024SLT Workshop
DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning2024Arxiv
Adaptive LoRA Merging for Efficient Domain Incremental Learning2024NeurIPS Workshop
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging2024Arxiv
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models2024ICMLInstructBLIP (Vicuna-7B), LLaVA-1.5 (Vicuna7B)
Adaptive Discovering and Merging for Incremental Novel Class Discovery2024AAAI
MagMax: Leveraging Model Merging for Seamless Continual Learning2024ECCV
Lm-cocktail: Resilient tuning of language models via model merging2024ACL FindingsLlama-2-chat-7b
Backward Compatibility During Data Updates by Weight Interpolation2024EACL
Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models2024EMNLP Findings
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging2024ArxivMISTRAL-7B, LLAMA-3-8B
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation2024ArxivLlama3-70B
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs2024ArxivMistral-7B, Llama-3-8B
WARP: On the Benefits of Weight Averaged Rewarded Policies2024ArxivGemma-7B
A Second-Order perspective on Compositionality and Incremental Learning2024Arxiv
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images2024Arxiv
DAM: Dynamic Adapter Merging for Continual Video QA Learning2024Arxiv
Task-Specific Skill Localization in Fine-tuned Language Model2023ICML
Tangent model composition for ensembling and continual fine-tuning2023ICCV
A Unified Continual Learning Framework with General Parameter-Efficient Tuning2023ICCV
Task Arithmetic with LoRA for Continual Learning2023NeurIPS Workshop
Mitigating the Alignment Tax of RLHF2023ArxivMistral-7B
PAINT: Patching open-vocabulary models by interpolating weights2022NeurIPS
Robust fine-tuning of zero-shot models2022CVPR

Model Merging in Multi-Task/Multi-Objective/Multi-Domain/Auxiliary Learning

Model Merging for Knowledge Transfer in Multi-Task Learning

Paper TitleYearConference/JournalRemark
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation2026ICLR
Multi-task Code LLMs: Data Mix or Model Merge?2026ArxivQwen Coder 2.5 7B, DeepSeek 7B
DivMerge: A divergence-based model merging method for multi-tasking2025Arxiv
Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning2025Arxiv
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging2024Arxiv
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging2024Arxiv
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning2024ArxivAya 23 8B
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks2024Arxiv
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer2024Arxiv
Evolutionary optimization of model merging recipes2024Arxivshisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch2024ICMLWizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Representation Surgery for Multi-Task Model Merging2024ICML
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts2024ICML
ZipIt! Merging Models from Different Tasks without Training2024ICLR
AdaMerging: Adaptive Model Merging for Multi-Task Learning2024ICLR
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies2023Arxiv
Resolving Interference When Merging Models2023NeurIPS
Editing models with task arithmetic2023ICLR

Model Merging for Knowledge Transfer in Multi-Objective Optimization

Paper TitleYearConference/JournalRemark
From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging2026AAAI
Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation2025ArxivLLaMA-2-7B
Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging2025ICML
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation2025ArxivLLaMA-2 7B
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging2024Arxiv
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion2024Arxiv
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation2024ArxivLlama3-8B

Model Merging for Knowledge Transfer in Multi-Domain Learning

Paper TitleYearConference/JournalRemark
Domain-Adaptive Model Merging across Disconnected Modes2026Arxiv
Bridging Domains through Subspace-Aware Model Merging2026Arxiv
Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR2026Arxiv
To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models2026ArxivQwen3-4B-Base
MMGRid: Navigating Temporal-aware and Cross-domain Generative Recommendation via Model Merging2026ArxivQwen3-0.6B
MergeRec: Model Merging for Data-Isolated Cross-Domain Sequential Recommendation2026KDD
DEM: Distribution Edited Model for Training with Mixed Data Distributions2024ArxivOpenLLaMA-7B, OpenLLaMA-13B
Merging Vision Transformers from Different Tasks and Domains2023Arxiv

Model Merging for Knowledge Transfer in Auxiliary Learning

Paper TitleYearConference/JournalRemark
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning2023NeurIPS

Model Merging in Out-of-Distribution/Domain Generalization

Model Merging for Better Out-of-Distribution Generalization

Paper TitleYearConference/JournalRemark
Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR2026Arxiv
Model soups need only one ingredient2026Arxiv
System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection2025ArxivQwen2.5-7B-Instruct
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data2025Arxiv
Out-of-Distribution Graph Models Merging2025Arxiv
SeWA: Selective Weight Average via Probabilistic Masking2025Arxiv
When, Where and Why to Average Weights?2025Arxiv
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation2024NeurIPS 2024 Workshop
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging2024ArxivLlama-2-7b
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models2024Arxiv
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging2024ICLR
Warm: On the benefits of weight averaged reward models2024ICML
Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy2024ECCV
Adaptive Stochastic Weight Averaging2024JMLR
Population parameter averaging (papa)2024TMLR
WARP: On the Benefits of Weight Averaged Rewarded Policies2024ArxivMistral 7B, Mixtral 8x7B
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average2024Arxiv
Model Stock: All we need is just a few fine-tuned models2024Arxiv
Lookaround Optimizer: 𝑘 steps around, 1 step average2023NeurIPS
Model ratatouille: Recycling diverse models for out-of-distribution generalization2023ICML
Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions2023ICLR
Lookaround Optimizer: k steps around, 1 step average2023NeurIPS
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models2023EACL
Dart: Diversify aggregate-repeat training improves generalization of neural networks2023CVPR
When do flat minima optimizers work?2022NeurIPS
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time2022ICML
Diverse weight averaging for out-of-distribution generalization2022NeurIPS
Robust fine-tuning of zero-shot models2022CVPR
Neural networks with late-phase weights2021ICLR
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well2020ICLR
SWALP: Stochastic weight averaging in low precision training2019ICML
Averaging weights leads to wider optima and better generalization2018UAI
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results2017NeurIPS

Model Merging for Better Domain Generalization or Domain Adaptation

Paper TitleYearConference/JournalRemark
Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models2025ArxivQwen2.5-7B, Llama3.1-8B
Harmonizing and Merging Source Models for CLIP-based Domain Generalization2025Arxiv
Realistic Evaluation of Model Merging for Compositional Generalization2024Arxiv
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks2024Arxiv
Training-Free Model Merging for Multi-target Domain Adaptation2024Arxiv
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation2024ArxivLlama3-70B
Ensemble of averages: Improving model selection and boosting performance in domain generalization2022NeurIPS
Swad: Domain generalization by seeking flat minima2021NeurIPS

Model Merging in Federated Learning

Model Merging for Local Knowledge Aggregation

Paper TitleYearConference/JournalRemark
FedMerge: Federated Model Merging for Personalization2026AAAI
Communication-Efficient Personalized Adaptation via Federated-Local Model Merging2026ArxivLLaMA-3.2-3B-Instruct
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning2026ICLR
Bi-level Personalization for Federated Foundation Models: A Task-vector Aggregation Approach2025ArxivLLaMA-7B
Intrinsic Training Signals for Federated Learning Aggregation2025ICIAP
Breaking the Aggregation Bottleneck in Federated Recommendation: A Personalized Model Merging Approach2025Arxiv
A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning2025Arxiv
Closed-form merging of parameter-efficient modules for Federated Continual Learning2025ICLR
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection2025Arxiv
FedMerge: Federated Personalization via Model Merging2025Arxiv
Personalized Language Models via Privacy-Preserving Evolutionary Model Merging2025ArxivLlama-2-7b, Mistral-7B-Instruct v0.2
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors2025Arxiv
Many-Task Federated Fine-Tuning via Unified Task Vectors2025Arxiv
PrivFusion: Privacy-Preserving Model Fusion via Decentralized Federated Graph Matching2024TKDE
Model Trip: Enhancing Privacy and Fairness in Model Fusion Across Multi-Federations for Trustworthy Global Healthcare2024ICDE
DapperFL: Domain Adaptive Federated Learning with Model Fusion Pruning for Edge Devices2024NeurIPS
FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion2024Arxiv
Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning2024Arxiv
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models2024CVPR
FedFisher: Leveraging Fisher Information for One-Shot Federated Learning2024AISTATS
lo-fi: distributed fine-tuning without communication2023TMLR
Revisiting Weighted Aggregation in Federated Learning with Neural Networks2023ICML
Deep neural network fusion via graph matching with applications to model ensemble and federated learning2022ICML
Federated Learning with Matched Averaging2020ICLR
Tackling the objective inconsistency problem in heterogeneous federated optimization2020NeurIPS
Model fusion via optimal transport2020NeurIPS
Bayesian nonparametric federated learning of neural networks2019ICML
Learning private neural language modeling with attentive aggregation2019IJCNN
Communication-Efficient Learning of Deep Networks from Decentralized Data2017AISTATS

Model Merging in Zero-shot/Few-shot Learning

Model Merging for Cross-task Generalization in Zero-shot Learning

Paper TitleYearConference/JournalRemark
Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis2026Arxiv
Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models2025NeurIPS WorkshopLLAMA-3.1-8B-INSTRUCT
Investigating Task Arithmetic for Zero-Shot Information Retrieval2025SIGIRLLama-2-7b
Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering2024ArxivQwen 60x2.7B, Qwen 45x2.7B, Qwen 30x2.7B, Mixtral 8x7B, Mixtral 6x7B, Mixtral 4x7B
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models2024ArxivLLAMA 3.1 8B
Learning to Route Among Specialized Experts for Zero-Shot Generalization2024ICML
Towards Modular LLMs by Building and Reusing a Library of LoRAs2024ICMLMistral-7B
Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities2024ACLLLaMA-2 13B, Chinese-LLaMA-13B, Chinese-Alpaca-13B, Mistral-7B, llama-2-ko-7b
Unlocking the Potential of Model Merging for Low-Resource Languages2024ArxivLlama-2-7B
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models2024Arxiv
No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement2024Arxiv
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models2024Arxiv
AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging2024ArxivLlama2-7b
Model Composition for Multimodal Large Language Models2024ArxivVicuna-7B-v1.5
Exploring the Benefits of Training Expert Language Models over Instruction Tuning2023ICML
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization2023ArxivLlama-2-7b
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization2023ArxivPaLM 2-S

Model Merging for Cross-task Generalization in Few-shot Learning

Paper TitleYearConference/JournalRemark
Task Arithmetic with Support Languages for Low-Resource ASR2026Arxiv
Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs2025CVPR
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks2024ACLLlama-2- 7B
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition2024COLMLlama-2-7B, Llama-2-13B
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild2024ACL
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?2024Arxiv
MerA: Merging pretrained adapters for few-shot learning2023Arxiv
Multi-Head Adapter Routing for Cross-Task Generalization2023NeurIPS

Model Merging in Adversarial Learning

Model Merging as an Attack

Paper TitleYearConference/JournalRemark
RogueMerge: Robust and Unified Attacks against LLM Model Merging2026ArxivLlama-3-8B and Qwen-2.5- 7B
When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion2026ArxivTulu-2-7b, Llama-3.1-Tulu-3-8B-DPO, OpenChat-3.5-0106
Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses2025Arxiv
Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability2025Arxiv
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy2025ACLLlama-3.2-3b-it, Gemma-2-2b-it, Qwen-2.5-3b-it, and Phi-3.5-mini-it
Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models2025ArxivLLaMA3.1-8B
From Purity to Peril: Backdooring Merged Models From “Harmless” Benign Components2025ArxivLLaMA2-7B-chat, Mistral-7B-v0.1
Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging2025Arxiv
Be Cautious When Merging Unfamiliar LLMs: A Phishing Model Capable of Stealing Privacy2025Arxiv
LoBAM: LoRA-Based Backdoor Attack on Model Merging2024Arxiv
BadMerging: Backdoor Attacks Against Model Merging2024CCS
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario2024ACLLlama-2-7B

Model Merging as a Defense or Intellectual Property Protection

Paper TitleYearConference/JournalRemark
From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging2026ICML
Defending against Backdoor Attacks via Module Switching2026ICLR
Making Models Unmergeable via Scaling-Sensitive Loss Landscape2026Arxiv
Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models2026ArxivLlama2-7B and Qwen3-8B
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging2026AAAILLaMA-2-13B, WizardLM-13B, WizardMath-13B, LLaMA-2-13B-Code Alpaca
Defending Unauthorized Model Merging via Dual-Stage Weight Protection2025Arxiv
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing2025Arxiv
POSTER: Investigating Transferability of Adversarial Examples in Model Merging2025ASIA CCS
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging2025Arxiv
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models2025Arxiv
BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge2025ArxivMistral-7B-Instruct-v0.2, Meta-Llama3-8B
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy2025ICCV
Large Language Models Merging for Enhancing the Link Stealing Attack on Graph Neural Networks2024ArxivVicuna-7B, Vicuna-13B
Strong Copyright Protection for Language Models via Adaptive Model Fusion2024ICMLLLaMa2 7B, StarCoder 7B
Hyper Adversarial Tuning for Boosting Adversarial Robustness of Pretrained Large Vision Models2024Arxiv
REEF: Representation Encoding Fingerprints for Large Language Models2024ArxivEvollm-jp-7b, Shisa-gamma-7b-v1, Wizardmath-7b-1.1, Abel-7b-002, Llama-2-7b, Openllama-2-7b, Mpt-7b, Internlm2-chat-20b, Mixtral-8x7b-instruct, Qwen-1.5-chat-72b
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace2024Arxiv
MergePrint: Robust Fingerprinting against Merging Large Language Models2024ArxivLLaMA-2-7B, WizardMath-7B-V1.0, LLaMA-2-7B-CHAT
Avoiding Copyright Infringement via Machine Unlearning2024ArxivLlama3-8B
Merging Improves Self-Critique Against Jailbreak Attacks2024ArxivMistral-7B, Mixtral-8x7B
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging2024ArxivLLaMA-2-7B, LLaMA-2-7B-CHAT, WizardMath-7B-V1.0
Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge2024ACL
Revisiting adapters with adversarial training2023ICLR
Seasoning model soups for robustness to adversarial and natural distribution shifts2023CVPR

Other Applications

Paper TitleYearConference/JournalRemark
StereoFactory: A Unified Merging Framework for Robust Stereo Matching2026Arxiv
Securing the Floor and Raising the Ceiling: A Merging-based Paradigm for Multi-modal Search Agents2026Arxiv
ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments2026Arxiv
Sparse Task Vector Mixup with Hypernetworks for Efficient Knowledge Transfer in Whole-Slide Image Prognosis2026Arxiv
Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging2026ArxivQwen3-0.6B, Gemma-2B, Phi4-3.8B
When Domain Pretraining Interferes with Instruction Alignment: An Empirical Study of Adapter Merging in Medical LLMs2026Arxiv14B-parameter LLM
MergeRec: Model Merging for Data-Isolated Cross-Domain Sequential Recommendation2026KDD
Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models2025Arxiv
System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection2025ArxivQwen2.5-7B-Instruct
Group-Aware Partial Model Merging for Children’s Automatic Speech Recognition2025Arxiv
Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic2025Arxiv
RecCocktail: A Generalizable and Efficient Framework for LLM-Based Recommendation2025AAAILlama-3.1-8B
A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs2025ArxivMistral-7B
WeaveRec: An LLM-Based Cross-Domain Sequential Recommendation Framework with Model Merging2025ArxivQwen2-7B
Effect of Model Merging in Domain-Specific Ad-hoc Retrieval2025Arxiv
Look the Other Way: Designing ‘Positive’ Molecules with Negative Data via Task Arithmetic2025Arxiv
Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic2025Arxiv
Distilling a speech and music encoder with task arithmetic2025Arxiv
MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation2025Arxiv
Oscillation-Reduced MXFP4 Training for Vision Transformers2025ICML
Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic2025Arxiv
Temporal Information Retrieval via Time-Specifier Model Merging2025Arxiv
Generative Representational Learning of Foundation Models for Recommendation2025Arxiv
Towards Model Merging for Tabular Telecommunications Data2025Arxiv
CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning2025Arxiv
U-Net Transplant: The Role of Pre-training for Model Merging in 3D Medical Segmentation2025International Conference on Medical Image Computing and Computer Assisted Intervention
CodeMerge: Codebook-Guided Model Merging for Robust Test-Time Adaptation in Autonomous Driving2025Arxiv
Mixture of Latent Experts Using Tensor Products2024TMLR
In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models2025Arxiv
Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos2025Arxiv
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs2025ArxivLLaMA-2-7B
Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos2025Arxiv
MedForge: Building Medical Foundation Models Like Open Source Software Development2025Arxiv
Cultural Palette: Pluralising Culture Alignment via Multi-agent Palette2024Arxiv
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging2024EMNLPLlama-2-7b
Is Multiple Object Tracking a Matter of Specialization?2024NeurIPS
Tracking Universal Features Through Fine-Tuning and Model Merging2024Arxiv
HM3: Heterogeneous Multi-Class Model Merging2024Arxiv
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation2024Interspeech
Erasure Coded Neural Network Inference via Fisher Averaging2024Arxiv
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair2024Arxiv
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks2024ArxivLlama2-7B, Llama2-13B-chat, Mistral-7B-instruct
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization2024Arxiv
An Attribute Interpolation Method in Speech Synthesis by Model Merging2024Arxiv
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition2024Arxiv
MedMerge: Merging Models for Effective Transfer Learning to Medical Imaging Tasks2024Arxiv
Experts Weights Averaging: A New General Training Scheme for Vision Transformers2023Arxiv
One Student Knows All Experts Know: From Sparse to Dense2022Arxiv
Meta-Learning PAC-Bayes Priors in Model Averaging2019AAAI

Star History

Star History Chart


Contact

We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.

If you have a related paper that was not added to the library, please contact us.

Email: ennengyang@qq.com / ennengyang@gmail.com