Awesome-Parallel-Text-Generation

February 27, 2026 ยท View on GitHub

Our Survey

A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models

The first comprehensive survey for Parallel Text Generation Methods. [PDF]

Methodology

AR-Based

Draft-and-Verify

PaperVenueCode
Adaptive Draft-Verification for Efficient Large Language Model DecodingAAAI 2025Github
Speculative Decoding with Big Little DecoderNeurIPS 2023Github
Block Verification Accelerates Speculative DecodingICLR 2025-
Cascade speculative drafting for even faster llm inferenceNeurIPS 2023Github
Dynamic Depth Decoding: Faster Speculative Decoding for LLMsarxiv 2024-
Distillspec: Improving speculative decoding via knowledge distillationICLR 2024-
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative DecodingACL 2024Github
Dynamic-Width Speculative Beam Decoding for Efficient LLM InferenceAAAI 2025Github
DySpec: Faster Speculative Decoding with Dynamic Token Tree StructureWWW 2025-
EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintyICML 2024Github
Eagle-2: Faster Inference of Language Models with Dynamic Draft TreesEMNLP 2024Github
Speculative Decoding via Early-Exiting for Faster LLM Inference with Thompson Sampling Control MechanismACL 2024-
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-designed Decoding TreeAAAI 2025Github
Fast Inference from Transformers via Speculative DecodingICML 2023Github
Graph-Structured Speculative DecodingACL 2024Github
Learning Harmonized Representations for Speculative SamplingICLR 2025Github
Hydra: Sequentially-dependent draft heads for medusa decodingCOLM 2024Github
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model AlignmentICLR 2025-
Kangaroo: Lossless self-speculative decoding via double early exitingNeurIPS 2024Github
Layer-skip: Enabling early-exit inference and self-speculative decodingACL 2024Github
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsPMLR 2024Github
Mixture of Attentions for Speculative DecodingICLR 2025Github
Optimized multi-token joint decoding with auxiliary model for llm inferenceICLR 2025Github
A Drop-in Solution for On-the-fly Adaptation of Speculative Decoding in Large Language ModelsACL 2025-
OPT-Tree: Speculative Decoding with Adaptive Draft Tree StructureTACL 2025Github
Ouroboros: Speculative Decoding with Large Model Enhanced DraftingEMNLP 2024Github
Online Speculative DecodingICML 2024Github
Pass: Parallel speculative samplingNeurIPS-ENLSP 2023-
Parallel Speculative Decoding with Adaptive Draft LengthICLR 2025Github
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined SpeculationSC 2024Github
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM DecodingTMLR 2024-
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel DecodingICCAD 2024-
REST: Retrieval-based speculative decodingNAACL 2024Github
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling without ReplacementICLR-LLMA 2024-
Sequoia: Scalable, robust, and hardware-aware speculative decodingarxiv 2024Github
Generation meets verification: Accelerating large language model inference with smart parallel auto-correct decodingACL 2024Github
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq GenerationEMNLP 2023Github
Specdec++: Boosting speculative decoding via adaptive candidate lengthsCOLM 2025Github
Specinfer: Accelerating generative large language model serving with tree-based speculative inference and verificationASPLOS 2024Github
SpecTr: Fast Speculative Decoding via Optimal TransportNeurIPS 2023-
Speed: speculative pipelined execution for efficient decodingNeurIPS-ENLSP 2023-
Swift: On-the-fly self-speculative decoding for llm inference accelerationICLR 2025Github
SpecReason: Fast and Accurate Inference-Time Compute via Speculative ReasoningNeurIPS 2025Github
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMsarXiv 2025Github
Scaling Speculative Decoding with Lookahead ReasoningNeurIPS 2025Github
CAS-Spec: Cascade Adaptive Self-Speculative Decoding for On-the-Fly Lossless Inference Acceleration of LLMsNeurIPS 2025-
Griffin: Effective token alignment for faster speculative decodingNeurIPS 2025Github
STree: Speculative Tree Decoding for Hybrid State-Space ModelsNeurIPS 2025Github

Decomposition-and-Fill

PaperVenueCode
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queriesarxiv 2025-
Falcon: Faster and parallel inference of large language models through enhanced semi-autoregressive drafting and custom-designed decoding treeAAAI 2025Github
Navigating the Path of Writing: Outline-guided Text Generation with Large Language ModelsNAACL 2025-
Skeleton-of-thought: Prompting llms for efficient parallel generationICLR 2024Github
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Modelsarxiv 2025-

Multiple Token Prediction

PaperVenueCode
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Modelsarxiv 2025-
On multi-token prediction for efficient LLM inferencearxiv 2025-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsICML 2024Github
Multi-Token Prediction Needs Registersarxiv 2025Github
Blockwise Parallel Decoding for Deep Autoregressive ModelsNeurIPS 2018-
Pass: Parallel speculative samplingNeurIPS-ENLSP 2023-
EAGLE: Speculative Sampling Requires Rethinking Feature UncertaintyICML 2024Github
Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potentialarxiv 2025-
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-trainingEMNLP 2020Github
Better & faster large language models via multi-token predictionICML 2024-
Deepseek-v3 technical reportarxiv 2024Github
MiMo: Unlocking the Reasoning Potential of Language Model--From Pretraining to Posttrainingarxiv 2025Github

Non-AR-Based

One-Shot Generation

PaperVenueCode
Non-autoregressive neural machine translationICLR 2018Github
End-to-end non-autoregressive neural machine translation with connectionist temporal classificationEMNLP 2018
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative RefinementEMNLP 2018Github
Lava nat: A non-autoregressive translation model with look-around decoding and vocabulary attentionarxiv 2025-
AligNART: Non-autoregressive neural machine translation by jointly learning to estimate alignment and translateEMNLP 2021-
Guiding non-autoregressive neural machine translation decoding with reordering informationAAAI 2021Github
Non-monotonic latent alignments for ctc-based non-autoregressive machine translationNeurIPS 2022Github
DePA: Improving Non-autoregressive Machine Translation with Dependency-Aware DecoderACL 2023Github
Directed acyclic transformer for non-autoregressive machine translationICML 2022Github
Viterbi decoding of directed acyclic transformer for non-autoregressive machine translationEMNLP 2022Github
Fully Non-autoregressive Neural Machine Translation: Tricks of the TradeACL-IJCNLP 2021-
Aligned cross entropy for non-autoregressive machine translationICML 2020Github
ngram-OAXE: Phrase-based order-agnostic cross entropy for non-autoregressive machine translationCOLING 2022Github
Multi-granularity optimization for non-autoregressive translationEMNLP 2022Github
Multilingual Non-Autoregressive Machine Translation without Knowledge DistillationIJCNLP-AACL 2023-
Self-Refine: Iterative Refinement with Self-FeedbackNeurIPS 2023Github
Tree-Structured Non-Autoregressive Decoding for Sequence-to-Sequence Text GenerationEMNLP 2025Github

Masked Generation

PaperVenueCode
Accelerating Large Language Model Decoding with Speculative Samplingarxiv 2023Github
Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical samplingICLR 2025-
A continuous time framework for discrete denoising modelsNeurIPS 2022Github
Discrete diffusion modeling by estimating the ratios of the data distributionICML 2024Github
Simplified and generalized masked diffusion for discrete dataNeurIPS 2024Github
Seed Diffusionarxiv 2025-
Target concrete score matching: A holistic framework for discrete diffusionICML 2025-
Discrete diffusion modeling by estimating the ratios of the data distributionICML 2024Github
Score-based continuous-time discrete diffusion modelsICLR 2023-
Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decodingarxiv 2025Github
Large language diffusion modelsICLR 2025Github
Beyond autoregression: Discrete diffusion for complex reasoning and planningICLR 2025Github
A reparameterized discrete diffusion model for text generationCOLM 2024Github
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked DiffusionsICML 2025-
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmaskingarxiv 2025-
Accelerating Diffusion Large Language Models with SlowFast: The Three Golden Principlesarxiv 2025Github
A continuous time framework for discrete denoising modelsNeurIPS 2022Github
Remasking discrete diffusion models with inference-time scalingICLR 2025Github
Simplified and generalized masked diffusion for discrete dataNeurIPS 2024Github
Path planning for masked diffusion model samplingarxiv 2025Github
Think while you generate: Discrete diffusion with planned denoisingICLR 2025Github
Accelerating Diffusion LLMs via Adaptive Parallel Decodingarxiv 2025-
Reviving any-subset autoregressive models with principled parallel sampling and speculative decodingarxiv 2025Github
dkv-cache: The cache for diffusion language modelsarxiv 2025Github
Accelerating diffusion language model inference via efficient kv caching and guided diffusionarxiv 2025-
Esoteric Language Modelsarxiv 2025Github
Beyond Autoregression: Fast LLMs via Self-Distillation Through TimeICLR 2025-
Cllms: Consistency large language modelsICML 2024Github
The diffusion dualityICML 2025Github
d1: Scaling reasoning in diffusion large language models via reinforcement learningarxiv 2025Github
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion ModelsAAAI 2025Github
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generationarxiv 2025Github
Scaling diffusion language models via adaptation from autoregressive modelsICLR 2025Github
Dream 7Barxiv 2025Github
DIFFPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language ModelsACL 2025Github
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMsarxiv 2025Github
Dream-Coder 7B: An Open Diffusion Language Model for Codearxiv 2025Github
Spg: Sandwiched policy gradient for masked diffusion language modelsarxiv 2025Github
Revolutionizing reinforcement learning framework for diffusion large language modelsICLR 2026Github
Diffusion llms can do faster-than-ar inference via discrete diffusion forcingICLR 2026Github
WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inferencearxiv 2025Github
d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Modelsarxiv 2025Github
d2: Improved Techniques for Training Reasoning Diffusion Language Modelsarxiv 2025-
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Modelsarxiv 2025Github
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Modelsarxiv 2025Github
Improving reasoning for diffusion language models via group diffusion policy optimizationarxiv 2025Github
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Modelsarxiv 2026Github
Principled rl for diffusion llms emerges from a sequence-level perspectiveICLR 2026Github
Diffusion Language Models For Code Infilling Beyond Fixed-size Canvasarxiv 2026Github
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generationarxiv 2025Github
Lopa: Scaling dllm inference via lookahead parallel decodingarxiv 2025Github
FAST-dLLM V2: Efficient Block-Diffusion LLMICLR 2026Github
d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillationarxiv 2026Github
dParallel: Learnable Parallel Decoding for dLLMsICLR 2026Github
Diffusion language models know the answer before decodingICLR 2026Github
Creditdecoding: Accelerating parallel decoding in diffusion large language models with trace creditsarxiv 2025-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language ModelsICLR 2025Github
Set Block Decoding is a Language Model Inference Acceleratorarxiv 2025-
LLaDA-MoE: A Sparse MoE Diffusion Language Modelarxiv 2025Github
dInfer: An Efficient Inference Framework for Diffusion Language Modelsarxiv 2025Github
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMsICLR 2026Github

Edit-Based Refinement

PaperVenueCode
Insertion transformer: Flexible sequence generation via insertion operationsICML 2019-
Levenshtein transformerNeurIPS 2019Github
EDITOR: An edit-based transformer with repositioning for neural machine translation with soft lexical constraintsTACL 2021Github
FELIX: Flexible Text Editing Through Tagging and InsertionEMNLP 2020-
Levenshtein OCRECCV 2022Github
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech RecognitionNeurIPS 2021Github
Non-autoregressive Text Editing with Copy-aware Latent AlignmentsEMNLP 2023Github
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine TranslationNAACL-SRW 2024-
Summarizing Like Human: Edit-Based Text Summarization with KeywordsICANN 2024-
Deterministic non-autoregressive neural sequence modeling by iterative refinementEMNLP 2018Github
Flowseq: Non-autoregressive conditional sequence generation with generative flowEMNLP 2019Github
Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posteriorAAAI 2020Github
Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine TranslationEMNLP 2020Github
Non-autoregressive machine translation with auxiliary regularizationAAAI 2019-
Imitation learning for non-autoregressive neural machine translationACL 2019-
An imitation learning curriculum for text editing with non-autoregressive modelsACL 2022Github
Fast structured decoding for sequence modelsNeurIPS 2019Github
An EM approach to non-autoregressive conditional sequence generationICML 2020-
Imputer: Sequence modelling via imputation and dynamic programmingICML 2020Github
Align-Refine: Non-Autoregressive Speech Recognition via Iterative RealignmentNAACL 2021Github
Learning to rewrite for non-autoregressive neural machine translationEMNLP 2021Github
RenewNAT: renewing potential translation for non-autoregressive transformerAAAI 2023-
Learning to recover from multi-modality errors for non-autoregressive neural machine translationACL 2020Github
Hybrid-regressive neural machine translationICLR 2023-
Iterative Translation Refinement with Large Language ModelsEAMT 2024-
IterGen: Iterative Semantic-aware Structured LLM Generation with BacktrackingICLR 2025Github
Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translationACL 2021Github
Understanding and Improving Lexical Choice in Non-Autoregressive TranslationICLR 2021Github
SlotRefine: A fast non-autoregressive model for joint intent detection and slot fillingEMNLP 2020Github
Non-autoregressive dialog state trackingICLR 2020Github
Unleashing the True Potential of LLMs: A Feedback-Triggered Self-Correction with Long-Term Multipath Decodingarxiv 2025-
ProRefine: Inference-Time Prompt Refinement with Textual Feedbackarxiv 2025Github
A Probabilistic Inference Scaling Theory for LLM Self-CorrectionEMNLP 2025Github