LLM-Based Fortran to C++ Translation Framework

May 7, 2026 · View on GitHub

Public-facing summary of research on LLM-assisted legacy scientific code translation, compiler-in-the-loop feedback, and evaluation of open-source LLMs for Fortran-to-C++ modernization.

Overview

This repository documents a public-facing summary and reimplementation outline of research I conducted as a Graduate Research Intern in the CCS-3 Division at Los Alamos National Laboratory.

The work contributed to the NAACL 2025 paper:

"LLM-Assisted Translation of Legacy FORTRAN Code to C++: A Cross-Platform Study"

The research studied how large language models can assist with translating legacy Fortran scientific computing code into modern C++, with emphasis on:

open-source LLM evaluation,
controlled prompting strategies,
code translation quality,
compiler feedback loops,
and functional validation.

Important Repository Note

The original research code, internal datasets, experiment artifacts, and detailed LANL documentation are not included in this repository.

This is intentional.

The original implementation and data were developed during work at Los Alamos National Laboratory and remain subject to internal review, approval, and release constraints. This repository therefore serves as a public project summary and reimplementation-safe description of the research contributions, rather than a full release of the internal experimental framework.

No restricted LANL code, data, internal reports, or non-public experiment artifacts are included here.

Research Motivation

Large scientific computing codebases still rely heavily on legacy Fortran. Many of these systems are:

long-lived,
performance-sensitive,
difficult to modernize manually,
domain-specific,
and expensive to validate.

Modern C++ is often preferred for maintainability, interoperability, tooling, and integration with newer software ecosystems. However, translating scientific Fortran code to C++ is difficult because translation must preserve:

numerical behavior,
memory semantics,
array indexing patterns,
control flow,
compiler compatibility,
and domain-specific logic.

This project explored whether LLMs can support this modernization process and how their outputs should be evaluated.

Research Questions

The project investigated questions such as:

How well can open-source LLMs translate legacy Fortran code into C++ under controlled prompting conditions?
How should translation quality be measured beyond surface-level similarity?
Can compiler feedback improve translation correctness?
What failure modes appear when LLMs translate scientific computing code?
How do model size, prompting strategy, and session context affect translation performance?

My Contributions

During this research, I contributed to the design and implementation of an evaluation workflow for LLM-based code translation.

Publicly describable contributions include:

Evaluation Framework

Designed and implemented components of a framework for evaluating multiple open-source LLMs on Fortran-to-C++ translation.
Supported controlled prompting experiments for comparing translation behavior across models.
Helped structure standardized test cases covering different Fortran programming patterns and translation challenges.

Quantitative Assessment

Worked with metrics such as CodeBLEU to assess structural and semantic similarity between generated C++ translations and reference implementations.
Supported evaluation approaches for measuring translation quality beyond exact string matching.
Helped compare model outputs across different architectures, model sizes, and prompting configurations.

Compiler-in-the-Loop Validation

Developed and evaluated feedback-loop ideas using compiler diagnostics.
Explored how GCC/GFortran compiler errors could be used to guide iterative correction.
Investigated agentic workflows where model outputs are refined using tool feedback.

Failure Analysis

Analyzed common translation failures, including syntax errors, semantic mismatches, incorrect control flow, and numerical inconsistencies.
Studied how prompting strategy and session context affected translation quality.
Contributed to interpretation of model behavior across code translation experiments.

System-Level Workflow

At a high level, the research workflow can be represented as:

flowchart LR
    A[Legacy Fortran Code] --> B[Prompt Construction]
    B --> C[Open-Source LLM]
    C --> D[Generated C++ Translation]
    D --> E[Static / Structural Metrics]
    D --> F[Compiler Validation]
    F --> G[Compiler Error Feedback]
    G --> B
    D --> H[Functional / Qualitative Analysis]

The key idea was to evaluate code translation as more than text generation. Generated translations need to be checked for structural similarity, compilability, and functional correctness.

Evaluation Dimensions

The project considered several complementary evaluation dimensions.

Dimension	Purpose
CodeBLEU / structural similarity	Measures overlap in code structure and semantics
Compilation success	Checks whether generated C++ compiles successfully
Compiler diagnostics	Identifies syntax, type, and compatibility issues
Functional equivalence	Assesses whether translated code preserves intended behavior
Prompting strategy	Compares zero-shot and context/session-based prompting
Model comparison	Evaluates behavior across open-source LLMs of different sizes

Technical Scope

The research involved:

Area	Details
Source language	Legacy Fortran
Target language	C++
Models	Open-source LLMs in the 7B–34B parameter range
Prompting	Zero-shot and session-maintained prompting strategies
Evaluation	CodeBLEU, custom metrics, compiler validation, qualitative failure analysis
Tooling	Python, Hugging Face Transformers, Ollama, GCC, GFortran
Analysis	Translation quality, consistency, error patterns, compiler-feedback behavior

Publication

This work contributed to the following publication:

Nishath Rajiv Ranasinghe, Shawn M. Jones, Michal Kucer, Ayan Biswas, Daniel O’Malley, Alexander Most, Selma Liliane Wanna, and Ajay Sreekumar.
"LLM-Assisted Translation of Legacy FORTRAN Code to C++: A Cross-Platform Study."
North American Chapter of the Association for Computational Linguistics (NAACL), 2025.

Why This Work Matters

LLM-based code translation is promising, but scientific computing raises a higher bar than ordinary code generation.

A translated scientific program must not only look plausible. It must:

compile,
preserve numerical behavior,
respect language-specific semantics,
maintain performance-sensitive structures,
and be understandable to domain scientists and software maintainers.

This project explored the reliability boundaries of LLMs in that setting and studied how evaluation frameworks can better capture translation quality.

Repository Status

This repository is intentionally minimal.

Component	Status
Public README summary	Available
Internal LANL code	Not released
Internal datasets/test cases	Not released
Internal figures/results	Not released
Published paper reference	Included
Reimplementation-safe methodology summary	Included

Future public additions may include:

toy examples using synthetic Fortran snippets,
a simplified compiler-feedback demo,
a public-safe evaluation template,
or links to the final published paper page.

Technologies Referenced

Python
Hugging Face Transformers
Ollama
Open-source LLMs
CodeBLEU
GCC
GFortran
C++
Fortran
Matplotlib / Plotly for internal analysis workflows

Citation

If referencing this work, please cite the NAACL 2025 paper once the official citation page is available.

@inproceedings{ranasinghe2025llmfortran,
  title     = {LLM-Assisted Translation of Legacy FORTRAN Code to C++: A Cross-Platform Study},
  author    = {Ranasinghe, Nishath Rajiv and Jones, Shawn M. and Kucer, Michal and Biswas, Ayan and O'Malley, Daniel and Most, Alexander and Wanna, Selma Liliane and Sreekumar, Ajay},
  booktitle = {Proceedings of the North American Chapter of the Association for Computational Linguistics},
  year      = {2025}
}

License

This repository summary is released under the MIT License.

No restricted LANL code, data, internal documentation, or non-public research artifacts are included.