SustainableQA: A Comprehensive Question Answering Dataset for Corporate Sustainability and EU Taxonomy Reporting

March 20, 2026 ยท View on GitHub

Dataset License Python Institution Paper

๐Ÿ”„ Dataset Generation Framework Pipeline

Pipeline Diagram

๐Ÿ“– Overview

SustainableQA is a large-scale question-answering dataset designed for corporate sustainability reporting and EU Taxonomy compliance. The dataset provides comprehensive QA pairs extracted from corporate sustainability reports and annual reports, enabling the development of AI systems for sustainability compliance and ESG analysis.

๐ŸŽฏ Key Features

  • 195,287 total QA pairs from corporate sustainability reports
  • 88,792 factoid questions + 102,539 non-factoid questions + 3,956 table-based QA
  • Three domains: EU Taxonomy, ESG, and Sustainability
  • Multi-span complexity: 16.7% of questions require multiple text spans

๐Ÿ“ˆ Dataset Statistics

ComponentCountDetails
Total QA Pairs195,287Factoid + Non-factoid + Tables
Factoid Questions88,792Short, precise answers
Non-factoid Questions102,539Descriptive, explanatory answers
Table-based QA3,956From 218 complex tables
Text Passages8,067Semantically coherent segments
Source Reports61German & Austrian companies

Category Distribution

CategoryPassagesFactoid QANon-factoid QATotal QA
ESG4,32048,26055,139103,399
EU Taxonomy7478,2608,90617,166
Sustainability3,00032,27238,49470,746

Answer Complexity Analysis

CategoryMean SpansSingle-SpanMulti-SpanComplexity Notes
Overall1.3683.3%16.7%EU Taxonomy most complex
ESG1.3783.1%16.9%Moderate complexity
EU Taxonomy1.4578.8%21.2%Highest complexity
Sustainability1.3284.6%15.4%Lowest complexity

๐Ÿ“‹ Sample Questions

Factoid Questions

Q: What SDGs are mentioned in the context?
A: SDG 13: Climate action, SDG 16: Peace and justice...

Q: What is the company's total CapEx for taxonomy-eligible activities?
A: โ‚ฌ15.2 million

Q: Which environmental objectives does activity 3.10 contribute to?
A: Climate change mitigation

Non-Factoid Questions

Q: Why does activity 3.10 fail to meet the substantial contribution criterion for the manufacture of hydrogen?
A: Because the quantified life-cycle GHG emission savings are not verified, which is necessary to fulfill the criterion for substantial contribution to climate change mitigation.

Q: How does the company assess the "Do No Significant Harm" criteria?
A: The company conducts a comprehensive evaluation across all six environmental objectives, ensuring that while contributing to one objective, the activity does not significantly harm the other five through detailed impact assessments and third-party verification.

This dataset is created for academic research at University of Innsbruck using publicly available corporate reports. All source materials were published for stakeholder transparency and regulatory compliance. The dataset is released under CC BY-NC 4.0 license for non-commercial research and educational use.

๐Ÿ“„ License

This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). This means you can use, share, and adapt the material for non-commercial purposes with proper attribution.

๐Ÿ“ง Contact


๐ŸŒŸ Star this repository if you find it helpful! ๐ŸŒŸ

Supporting AI Development for Sustainable Finance and Corporate Transparency

Visitors