SemEval 2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

May 7, 2026 · View on GitHub

SemEval 2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

This is an extension of Myung et al., 2024. If you use our data, please cite:

  @inproceedings{semeval2026task7,
  title     = "{S}em{E}val-2026 {T}ask 7: {E}veryday {K}nowledge {A}cross {D}iverse {L}anguages and {C}ultures",
  author    = "Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and
               Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and
               Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and
               Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and
               Shu-Kai Hsieh and Joseph Marvin Imperial and Roy Ka-Wei Lee and Chenyang Lyu and
               Younes Samih and Johan Sjons and Bryan Tan and Asahi Ushio and Weihua Zheng and Zhengyuan Liu and
               Alice Oh and Jose Camacho-Collados",
  booktitle = "Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)",
  year      = "2026",
  publisher = "Association for Computational Linguistics"
}

Preprint available at: https://arxiv.org/abs/2605.02601.

  • ⚠️ This is an evaluation-only task. Hence, BLEnD may NOT be used for fine-tuning or few-shot learning. However, any NLP system may be submitted and you are free to be as creative as you wish!
  • ❗🌐 Competition website: CodaBench
  • 💡 Discord channel: Join here to ask questions and receive updates
  • 🔧 Questions or issues: Please create an issue
  • ✉️ Email organisers: semeval-2026-blend-organisers[at]googlegroups[dot]com
  • 💬 Participants’ Google Group: Request to join at semeval-2026-task7-blend-participants[at]googlegroups[dot]com

Content

📢 News

6 Feb 2026 ❗❗❗ The gold labels are now available here❗❗❗

Jan 2026 The test phase will start on January 19 and run until February 2 (the dev phase has therefore been extended). Register on CodaBench

Sep 2025 The competition website is now live, register on CodaBench

Aug 2025 ⚠️ The pilot data is available here.

Everyday Knowledge Across Diverse Languages and Cultures

The global deployment of large language models (LLMs) and NLP systems requires cultural awareness. Yet, these models often lack culture-specific knowledge, particularly for under-served languages and regions. Their outputs frequently reflect Western-centric perspectives or stereotypes inherited from training data. Existing benchmarks, largely based on monolingual datasets or Wikipedia, often fail to capture the realities of everyday life across cultures.

This shared task aims to evaluate the cultural awareness of LLMs and NLP systems across multiple languages. We will use an extended version of the manually constructed BLEnD benchmark (Myung et al., 2024) as validation and test sets for several languages. BLEnD is specifically designed for evaluation and will not be used for training, ensuring that results reflect a model's ability to generalize to unseen, diverse cultural and linguistic contexts.

BLEnD currently covers 13 languages and 16 cultures. For this shared task, we will expand its scope to include 17 additional language–culture pairs.

The languages and cultures inncluded in our dataset are as follows (completed annotations from the original BLEnD are in bold):

AreaLanguage (Region)
AfricaArabic (Algeria), Amharic (Ethiopia), Hausa (Northern Nigeria), Arabic (Egypt), Arabic (Morocco)
AsiaAssamese (Assam, India), Azerbaijani (Azerbaijan), Mandarin (China), Indonesian (Indonesia), Javanese (West Java, Indonesia), Persian (Iran), Korean (North Korea), Korean (South Korea), Arabic (Saudi Arabia), Japanese (Japan), Bengali (India), Tagalog (Philippines), Tamil (Sri Lanka), Tamil (Singapore), Taiwaanese Mandarin (Taiwan), Singaporean Mandarin (Singapore), Malay (Singapore)
AustraliaEnglish (AU)
EuropeGreek (Greece), Spanish (Spain), English (UK), French (France), Bulgarian (Bulgaria), Swedish (Sweden), Irish (Ireland), Basque (Basque Country)
North AmericaEnglish (US)
Latin AmericaSpanish (Equador), Spanish (Mexico)

Tracks

Table 1. Track 1 (SAQ)

LocaleLanguageRegion / Country
am-ETAmharicEthiopia
ar-DZArabicAlgeria
ar-EGArabicEgypt
ar-MAArabicMorocco
ar-SAArabicSaudi Arabia
as-ASAssameseIndia
az-AZAzerbaijaniAzerbaijan
bg-BGBulgarianBulgaria
el-GRGreekGreece
en-AUEnglishAustralia
en-GBEnglishUnited Kingdom
en-SGEnglishSingapore
en-USEnglishUnited States
es-ECSpanishEcuador
es-ESSpanishSpain
es-MXSpanishMexico
eu-PVBasqueBasque Country (Spain)
fa-IRPersian (Farsi)Iran
fr-FRFrenchFrance
ga-IEIrishIreland
ha-NGHausaNigeria
id-IDIndonesianIndonesia
ja-JPJapaneseJapan
ko-KPKoreanNorth Korea
ko-KRKoreanSouth Korea
ms-SGMalaySingapore
su-JBSundaneseIndonesia
sv-SESwedishSweden
ta-LKTamilSri Lanka
ta-SGTamilSingapore
tl-PHTagalogPhilippines
zh-CNChineseChina
zh-SGChineseSingapore
zh-TWChineseTaiwan
en-ASEnglishAssam
en-AZEnglishAzerbaijan
en-BGEnglishBulgaria
en-CNEnglishChina
en-DZEnglishAlgeria
en-ECEnglishEcuador
en-EGEnglishEgypt
en-ESEnglishSpain
en-ETEnglishEthiopia
en-FREnglishFrance
en-GREnglishGreece
en-IDEnglishIndonesia
en-IEEnglishIreland
en-IREnglishIran
en-JBEnglishIndonesia (West Java)
en-JPEnglishJapan
en-KPEnglishNorth Korea
en-KREnglishSouth Korea
en-LKEnglishSri Lanka
en-MAEnglishMorocco
en-MXEnglishMexico
en-NGEnglishNigeria
en-PHEnglishPhilippines
en-PVEnglishBasque Country (Spain)
en-SAEnglishSaudi Arabia
en-SEEnglishSweden
en-TWEnglishTaiwan

Table 2. Track 2 (MCQ)

LocaleRegion / Country
am-ETEthiopia
ar-DZAlgeria
ar-EGEgypt
ar-MAMorocco
ar-SASaudi Arabia
as-ASAssam (India)
az-AZAzerbaijan
bg-BGBulgaria
el-GRGreece
en-AUAustralia
en-GBUnited Kingdom
en-USUnited States
es-ECEcuador
es-ESSpain
es-MXMexico
eu-PVBasque Country (Spain)
fa-IRIran
fr-FRFrance
ga-IEIreland
ha-NGNigeria
id-IDIndonesia
ja-JPJapan
ko-KPNorth Korea
ko-KRSouth Korea
su-JBIndonesia
sv-SESweden
ta-LKSri Lanka
tl-PHPhilippines
zh-CNChina
zh-SGSingapore

Track 1: Short Answer Questions (SAQ)

Participants will evaluate their models on short-answer questions (SAQs) to assess their model's ability to generate accurate responses while accounting for cultural and linguistic diversity. This track covers all languages. For each question, correctness will be determined by comparison with human-annotated reference answers from BLEnD.

Data in this track includes two variants: (1) an English variant, in which answers are in English (corresponding to locale codes in Table 1 that begin with en), and (2) a native-language version, in which answers are in the original language (corresponding to locale codes in Table 1 that begin with the original language code).

Track 2: Multiple-Choice Questions (MCQ)

In this track, questions are provided in English only for the regions listed in Table 2. Each question includes four answer options, each representing a cultural perspective from a different country or region---i.e., the option that received the highest number of votes for that country or region. To ensure fairness, questions are filtered to exclude those marked as culturally irrelevant or unclear by human annotators. Each multiple-choice question contains four answer options, with no more than one option representing any given country or region. The model is evaluated based on its ability to identify the culturally appropriate choice for each question and region.

Evaluation

We will evaluate each submission using accuracy based on the alignment of the generated answer with human annotations. Notably, our evaluation accounts for variations in responses, which ensures a more robust assessment. Specifically, in the SAQ track, a model-generated answer is marked as correct if it matches any of the responses provided by human annotators for the same question, and in the MCQ track, accuracy is calculated based on the correctness of the selected answer. More details about the evaluation protocol can be found in (Myung et al., 2024).

Important Dates and Task Phases

TaskDate
Sample data ready8 August 2025
Evaluation start12 January 2026
Evaluation end2 February 2026
Paper submission dueFebruary 2026 (to be updated when the SemEval Chairs will communicate the date)
Notification to authorsMarch 2026 (to be updated when the SemEval Chairs will communicate the date)
Camera ready dueApril 2026 (to be updated when the SemEval Chairs will communicate the date)
SemEval workshopJuly 2026 (co-located with ACL 2026)

How to Participate

  1. Register: Sign up on the CodaBench competition platform.
  2. Track: Decide on the track(s) you want to participate in (Track 1 and/or 2).
  3. Download: Access to the questions for each track will be provided in this repository.
  4. Submit: Submit your models' predictions on the CodaBench competition platform.

Detailed guidelines would be provided soon.

Competition Rules and Terms

1. Consent to Public Release of Scores
  • By submitting results, you consent to the public release of your scores on:
    • the competition website,
    • at the designated workshop,
    • in associated proceedings.
  • Task organizers have discretion over the release and choice of metrics.
  • Scores may include:
    • automatic and manual quantitative judgments,
    • qualitative judgments,
    • other metrics as deemed appropriate.
2. Score Release and Validity
  • Task organizers reserve the right to withhold scores for:
    • incomplete submissions,
    • erroneous submissions,
    • deceptive submissions,
    • rule-violating submissions.
  • Inclusion of a submission's scores does not constitute endorsement.
3. Team Participation Rules
  • Participants may be involved in only one team.
  • Exceptions may be granted with prior approval from organizers.
4. Account Management
  • Each team must create and use exactly one account on the designated platform.
5. Team Constitution
  • Team membership cannot be changed after the evaluation period begins.
6. Development Period Rules
  • Teams can submit up to 999 submissions.
  • Results are visible only to the submitting team.
  • Leaderboard is disabled.
  • Warnings and errors are visible for each submission.
7. Evaluation Period Rules
  • The teams are contrained to make 3 submissions.
  • Only the final submission will be considered official.
  • Warnings and errors are visible for each submission.
8. Post-Competition
  • The gold labels will be released after the competition.
  • The teams are encouraged to report results on all their system variants in their description paper.
  • The official submission results must be clearly indicated.
9. Public Release of Submissions
  • Final team submissions may be made public after the evaluation period.
10. Disclaimer about the Datasets
  • Organizers and affiliated institutions provide no warranties on dataset correctness or completeness.
  • They are not liable for dataset access or usage.
11. Peer Review Process
  • Each participant will review another team's system description paper.
12. Dataset Usage Restrictions
  • Datasets should only be used for scientific or research purposes.
  • Any other use is explicitly prohibited.
  • Datasets must not be redistributed or shared with third parties.
  • Interested parties should be directed to the official website.
13. Final ranking
  • To be included in the official task ranking, you **MUST** submit a system description paper.

Dataset Paper

The dataset paper for the initial version of the BLEnD can be found here (accepted to NeurIPS Datasets & Benchmark Track 2024).

FAQs

Do I have to participate in all languages for a given track?
  • No, you can participate in one or more languages.
Can I fine-tune my model on BLEnD?
  • **No**. We are using BLEnD for evaluation only but you can submit **any** NLP system!
How will you verify my submitted model?
  • To be included in the final team rankings of our shared task, it is mandatory for participants to submit a system description paper describing their approaches and methodologies in detail, therefore ensuring scientific integrity.
Can I use LLMs in the different tracks?
  • Yes.
Can I use additional datasets (e.g, publicly provided ones from other sources)?
  • Yes. Please do cite them in the system description paper.
How was the data annotated and did you use LLMs to annotate it?
  • No. The data was annotated by native speakers (≥5 per instance), not LLMs. Annotators answered the given cultural question based on their cultural background, without aksing the LLM. Different answer options were expected since this is a subjective task. See the task definition for more details.
Will I be included in the final ranking if I do not write a system description paper?
  • No. You **MUST** write a system description paper to be included in the final ranking.
I have never written a system description paper. How can I write one?
  • We will provide an online writing tutorial and share resources to help you write your paper.
Do I need to pay conference registration fees and/or attend SemEval for my paper to be published?
  • No. It is not required to attend SemEval or pay registration fees for your paper to be published. However, if you want to attend, you must pay the attendance fee.
Our system did not perform very well, should I still write a system description paper?
  • Yes! We want insights from all participants—even if your system did not perform well. Negative results are still valuable.

Resources

  1. SemEval 2026 Shared Tasks
  2. Frequently Asked Questions about SemEval
  3. Paper Submission Requirements
  4. How to write a task description paper?
  5. Guidelines for Writing Papers
  6. Paper style files
  7. Paper submission link (to be added)
  8. References (more to be added)

Organizers

NameRoleAffiliation
Nedjma OusidhoumLeadCardiff University
Junho MyungLeadKAIST
Carla Perez-AlmendrosLeadCardiff University
Jiho JinLeadKAIST
Yi ZhouLeadCardiff University
Chenyang LyuOrganiserAlibaba
Meriem BeloucifOrganiser, Language Lead (Swedish)Uppsala University
Amr KelegOrganiser, Language Lead (Arabic, EG)University of Edinburgh
Rodrigo AgerriLanguage Lead (Basque, ES)University of the Basque Country
Vladimir AraujoLanguage Lead (Spanish, EC)Sailplane AI
Naomi BaesLanguage Lead (English, AU)University of Melbourne
James BarryLanguage Lead (Irish)IBM Research
Joanne BoissonLanguage Lead (French, FR)Cardiff University
Nancy F. ChenLanguage Lead (Mandarin/Malay/Tamil, SG)A*STAR Institute for Infocomm Research, Singapore
Christine de KockLanguage Lead (English, AU)University of Melbourne
Aleksandra EdwardsLanguage Lead (Bulgarian)Cardiff University
Joseba Fernandez de LandaLanguage Lead (Basque, ES)University of the Basque Country
Mohamed Fazli ImamLanguage Lead (Tamil, LK)MBZUAI
Huda HakamiLanguage Lead (Arabic, SA)Taif University, Saudi Arabia
Shu-Kai HsiehLanguage Lead (Mandarin, TW)National Taiwan University
Joseph Marvin ImperialLanguage Lead (Tagalog, PH)University of Bath
Roy Ka-Wei LeeLanguage Lead (Mandarin/Malay/Tamil, SG)Singapore University of Technology and Design
Rifki Afina PutriLanguage Lead (Indonesian/Javanese, ID)Universitas Gadjah Mada
Younes SamihLanguage Lead (Arabic, MA)IBM
Johan SjonsLanguage Lead (Swedish)Uppsala University
Bryan TanLanguage Lead (Mandarin/Malay/Tamil, SG)Singapore University of Technology and Design
Asahi UshioLanguage Lead (Japanese)Google
Weihua ZhengLanguage Lead (Mandarin/Malay/Tamil, SG)Singapore University of Technology and Design
Liu ZhengyuanLanguage Lead (Mandarin/Malay/Tamil, SG)A*STAR Institute for Infocomm Research, Singapore
Alice OhAdvisory OrganiserKAIST
Jose Camacho-ColladosAdvisory OrganiserCardiff University