SemEval 2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
May 7, 2026 · View on GitHub
SemEval 2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
This is an extension of Myung et al., 2024. If you use our data, please cite:
@inproceedings{semeval2026task7,
title = "{S}em{E}val-2026 {T}ask 7: {E}veryday {K}nowledge {A}cross {D}iverse {L}anguages and {C}ultures",
author = "Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and
Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and
Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and
Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and
Shu-Kai Hsieh and Joseph Marvin Imperial and Roy Ka-Wei Lee and Chenyang Lyu and
Younes Samih and Johan Sjons and Bryan Tan and Asahi Ushio and Weihua Zheng and Zhengyuan Liu and
Alice Oh and Jose Camacho-Collados",
booktitle = "Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)",
year = "2026",
publisher = "Association for Computational Linguistics"
}
Preprint available at: https://arxiv.org/abs/2605.02601.
- ⚠️ This is an evaluation-only task. Hence, BLEnD may NOT be used for fine-tuning or few-shot learning. However, any NLP system may be submitted and you are free to be as creative as you wish!
- ❗🌐 Competition website: CodaBench
- 💡 Discord channel: Join here to ask questions and receive updates
- 🔧 Questions or issues: Please create an issue
- ✉️ Email organisers: semeval-2026-blend-organisers[at]googlegroups[dot]com
- 💬 Participants’ Google Group: Request to join at semeval-2026-task7-blend-participants[at]googlegroups[dot]com
Content
- 📢 News
- Everyday Knowledge Across Diverse Languages and Cultures
- Tracks
- Evaluation
- Important Dates and Task Phases
- How to Participate
- Competition Rules and Terms
- Dataset paper
- FAQs
- Resources
- Organizers
📢 News
6 Feb 2026 ❗❗❗ The gold labels are now available here❗❗❗
Jan 2026 The test phase will start on January 19 and run until February 2 (the dev phase has therefore been extended). Register on CodaBench
Sep 2025 The competition website is now live, register on CodaBench
Aug 2025 ⚠️ The pilot data is available here.
Everyday Knowledge Across Diverse Languages and Cultures
The global deployment of large language models (LLMs) and NLP systems requires cultural awareness. Yet, these models often lack culture-specific knowledge, particularly for under-served languages and regions. Their outputs frequently reflect Western-centric perspectives or stereotypes inherited from training data. Existing benchmarks, largely based on monolingual datasets or Wikipedia, often fail to capture the realities of everyday life across cultures.
This shared task aims to evaluate the cultural awareness of LLMs and NLP systems across multiple languages. We will use an extended version of the manually constructed BLEnD benchmark (Myung et al., 2024) as validation and test sets for several languages. BLEnD is specifically designed for evaluation and will not be used for training, ensuring that results reflect a model's ability to generalize to unseen, diverse cultural and linguistic contexts.
BLEnD currently covers 13 languages and 16 cultures. For this shared task, we will expand its scope to include 17 additional language–culture pairs.
The languages and cultures inncluded in our dataset are as follows (completed annotations from the original BLEnD are in bold):
| Area | Language (Region) |
|---|---|
| Africa | Arabic (Algeria), Amharic (Ethiopia), Hausa (Northern Nigeria), Arabic (Egypt), Arabic (Morocco) |
| Asia | Assamese (Assam, India), Azerbaijani (Azerbaijan), Mandarin (China), Indonesian (Indonesia), Javanese (West Java, Indonesia), Persian (Iran), Korean (North Korea), Korean (South Korea), Arabic (Saudi Arabia), Japanese (Japan), Bengali (India), Tagalog (Philippines), Tamil (Sri Lanka), Tamil (Singapore), Taiwaanese Mandarin (Taiwan), Singaporean Mandarin (Singapore), Malay (Singapore) |
| Australia | English (AU) |
| Europe | Greek (Greece), Spanish (Spain), English (UK), French (France), Bulgarian (Bulgaria), Swedish (Sweden), Irish (Ireland), Basque (Basque Country) |
| North America | English (US) |
| Latin America | Spanish (Equador), Spanish (Mexico) |
Tracks
Table 1. Track 1 (SAQ)
| Locale | Language | Region / Country |
|---|---|---|
| am-ET | Amharic | Ethiopia |
| ar-DZ | Arabic | Algeria |
| ar-EG | Arabic | Egypt |
| ar-MA | Arabic | Morocco |
| ar-SA | Arabic | Saudi Arabia |
| as-AS | Assamese | India |
| az-AZ | Azerbaijani | Azerbaijan |
| bg-BG | Bulgarian | Bulgaria |
| el-GR | Greek | Greece |
| en-AU | English | Australia |
| en-GB | English | United Kingdom |
| en-SG | English | Singapore |
| en-US | English | United States |
| es-EC | Spanish | Ecuador |
| es-ES | Spanish | Spain |
| es-MX | Spanish | Mexico |
| eu-PV | Basque | Basque Country (Spain) |
| fa-IR | Persian (Farsi) | Iran |
| fr-FR | French | France |
| ga-IE | Irish | Ireland |
| ha-NG | Hausa | Nigeria |
| id-ID | Indonesian | Indonesia |
| ja-JP | Japanese | Japan |
| ko-KP | Korean | North Korea |
| ko-KR | Korean | South Korea |
| ms-SG | Malay | Singapore |
| su-JB | Sundanese | Indonesia |
| sv-SE | Swedish | Sweden |
| ta-LK | Tamil | Sri Lanka |
| ta-SG | Tamil | Singapore |
| tl-PH | Tagalog | Philippines |
| zh-CN | Chinese | China |
| zh-SG | Chinese | Singapore |
| zh-TW | Chinese | Taiwan |
| en-AS | English | Assam |
| en-AZ | English | Azerbaijan |
| en-BG | English | Bulgaria |
| en-CN | English | China |
| en-DZ | English | Algeria |
| en-EC | English | Ecuador |
| en-EG | English | Egypt |
| en-ES | English | Spain |
| en-ET | English | Ethiopia |
| en-FR | English | France |
| en-GR | English | Greece |
| en-ID | English | Indonesia |
| en-IE | English | Ireland |
| en-IR | English | Iran |
| en-JB | English | Indonesia (West Java) |
| en-JP | English | Japan |
| en-KP | English | North Korea |
| en-KR | English | South Korea |
| en-LK | English | Sri Lanka |
| en-MA | English | Morocco |
| en-MX | English | Mexico |
| en-NG | English | Nigeria |
| en-PH | English | Philippines |
| en-PV | English | Basque Country (Spain) |
| en-SA | English | Saudi Arabia |
| en-SE | English | Sweden |
| en-TW | English | Taiwan |
Table 2. Track 2 (MCQ)
| Locale | Region / Country |
|---|---|
| am-ET | Ethiopia |
| ar-DZ | Algeria |
| ar-EG | Egypt |
| ar-MA | Morocco |
| ar-SA | Saudi Arabia |
| as-AS | Assam (India) |
| az-AZ | Azerbaijan |
| bg-BG | Bulgaria |
| el-GR | Greece |
| en-AU | Australia |
| en-GB | United Kingdom |
| en-US | United States |
| es-EC | Ecuador |
| es-ES | Spain |
| es-MX | Mexico |
| eu-PV | Basque Country (Spain) |
| fa-IR | Iran |
| fr-FR | France |
| ga-IE | Ireland |
| ha-NG | Nigeria |
| id-ID | Indonesia |
| ja-JP | Japan |
| ko-KP | North Korea |
| ko-KR | South Korea |
| su-JB | Indonesia |
| sv-SE | Sweden |
| ta-LK | Sri Lanka |
| tl-PH | Philippines |
| zh-CN | China |
| zh-SG | Singapore |
Track 1: Short Answer Questions (SAQ)
Participants will evaluate their models on short-answer questions (SAQs) to assess their model's ability to generate accurate responses while accounting for cultural and linguistic diversity. This track covers all languages. For each question, correctness will be determined by comparison with human-annotated reference answers from BLEnD.
Data in this track includes two variants: (1) an English variant, in which answers are in English (corresponding to locale codes in Table 1 that begin with en), and (2) a native-language version, in which answers are in the original language (corresponding to locale codes in Table 1 that begin with the original language code).
Track 2: Multiple-Choice Questions (MCQ)
In this track, questions are provided in English only for the regions listed in Table 2. Each question includes four answer options, each representing a cultural perspective from a different country or region---i.e., the option that received the highest number of votes for that country or region. To ensure fairness, questions are filtered to exclude those marked as culturally irrelevant or unclear by human annotators. Each multiple-choice question contains four answer options, with no more than one option representing any given country or region. The model is evaluated based on its ability to identify the culturally appropriate choice for each question and region.
Evaluation
We will evaluate each submission using accuracy based on the alignment of the generated answer with human annotations. Notably, our evaluation accounts for variations in responses, which ensures a more robust assessment. Specifically, in the SAQ track, a model-generated answer is marked as correct if it matches any of the responses provided by human annotators for the same question, and in the MCQ track, accuracy is calculated based on the correctness of the selected answer. More details about the evaluation protocol can be found in (Myung et al., 2024).
Important Dates and Task Phases
| Task | Date |
|---|---|
| Sample data ready | 8 August 2025 |
| Evaluation start | 12 January 2026 |
| Evaluation end | 2 February 2026 |
| Paper submission due | February 2026 (to be updated when the SemEval Chairs will communicate the date) |
| Notification to authors | March 2026 (to be updated when the SemEval Chairs will communicate the date) |
| Camera ready due | April 2026 (to be updated when the SemEval Chairs will communicate the date) |
| SemEval workshop | July 2026 (co-located with ACL 2026) |
How to Participate
- Register: Sign up on the CodaBench competition platform.
- Track: Decide on the track(s) you want to participate in (Track 1 and/or 2).
- Download: Access to the questions for each track will be provided in this repository.
- Submit: Submit your models' predictions on the CodaBench competition platform.
Detailed guidelines would be provided soon.
Competition Rules and Terms
1. Consent to Public Release of Scores
- By submitting results, you consent to the public release of your scores on:
- the competition website,
- at the designated workshop,
- in associated proceedings.
- Task organizers have discretion over the release and choice of metrics.
- Scores may include:
- automatic and manual quantitative judgments,
- qualitative judgments,
- other metrics as deemed appropriate.
2. Score Release and Validity
- Task organizers reserve the right to withhold scores for:
- incomplete submissions,
- erroneous submissions,
- deceptive submissions,
- rule-violating submissions.
- Inclusion of a submission's scores does not constitute endorsement.
3. Team Participation Rules
- Participants may be involved in only one team.
- Exceptions may be granted with prior approval from organizers.
4. Account Management
- Each team must create and use exactly one account on the designated platform.
5. Team Constitution
- Team membership cannot be changed after the evaluation period begins.
6. Development Period Rules
- Teams can submit up to 999 submissions.
- Results are visible only to the submitting team.
- Leaderboard is disabled.
- Warnings and errors are visible for each submission.
7. Evaluation Period Rules
- The teams are contrained to make 3 submissions.
- Only the final submission will be considered official.
- Warnings and errors are visible for each submission.
8. Post-Competition
- The gold labels will be released after the competition.
- The teams are encouraged to report results on all their system variants in their description paper.
- The official submission results must be clearly indicated.
9. Public Release of Submissions
- Final team submissions may be made public after the evaluation period.
10. Disclaimer about the Datasets
- Organizers and affiliated institutions provide no warranties on dataset correctness or completeness.
- They are not liable for dataset access or usage.
11. Peer Review Process
- Each participant will review another team's system description paper.
12. Dataset Usage Restrictions
- Datasets should only be used for scientific or research purposes.
- Any other use is explicitly prohibited.
- Datasets must not be redistributed or shared with third parties.
- Interested parties should be directed to the official website.
13. Final ranking
- To be included in the official task ranking, you **MUST** submit a system description paper.
Dataset Paper
The dataset paper for the initial version of the BLEnD can be found here (accepted to NeurIPS Datasets & Benchmark Track 2024).
FAQs
Do I have to participate in all languages for a given track?
- No, you can participate in one or more languages.
Can I fine-tune my model on BLEnD?
- **No**. We are using BLEnD for evaluation only but you can submit **any** NLP system!
How will you verify my submitted model?
- To be included in the final team rankings of our shared task, it is mandatory for participants to submit a system description paper describing their approaches and methodologies in detail, therefore ensuring scientific integrity.
Can I use LLMs in the different tracks?
- Yes.
Can I use additional datasets (e.g, publicly provided ones from other sources)?
- Yes. Please do cite them in the system description paper.
How was the data annotated and did you use LLMs to annotate it?
- No. The data was annotated by native speakers (≥5 per instance), not LLMs. Annotators answered the given cultural question based on their cultural background, without aksing the LLM. Different answer options were expected since this is a subjective task. See the task definition for more details.
Will I be included in the final ranking if I do not write a system description paper?
- No. You **MUST** write a system description paper to be included in the final ranking.
I have never written a system description paper. How can I write one?
- We will provide an online writing tutorial and share resources to help you write your paper.
Do I need to pay conference registration fees and/or attend SemEval for my paper to be published?
- No. It is not required to attend SemEval or pay registration fees for your paper to be published. However, if you want to attend, you must pay the attendance fee.
Our system did not perform very well, should I still write a system description paper?
- Yes! We want insights from all participants—even if your system did not perform well. Negative results are still valuable.
Resources
- SemEval 2026 Shared Tasks
- Frequently Asked Questions about SemEval
- Paper Submission Requirements
- How to write a task description paper?
- Guidelines for Writing Papers
- Paper style files
- Paper submission link (to be added)
- References (more to be added)
-
Myung, Junho, et al. "Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages." Advances in Neural Information Processing Systems 37 (2024): 78104-78146.
-
Pawar, Siddhesh, et al. "Survey of cultural awareness in language models: Text and beyond." Computational Linguistics (2025): 1-96.
-
Md. Arid Hasan, Maram Hasanain, Fatema Ahmad, Sahinur Rahman Laskar, Sunaya Upadhyay, Vrunda N Sukhadia, Mucahid Kutlu, Shammur Absar Chowdhury, and Firoj Alam. 2025. NativQA: Multilingual Culturally-Aligned Natural Query for LLMs. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14886–14909, Vienna, Austria. Association for Computational Linguistics.
-
Alam, Firoj, et al. "NativQA Framework: Enabling llms with native, local, and everyday knowledge." arXiv preprint arXiv:2504.05995 (2025).
Organizers
| Name | Role | Affiliation |
|---|---|---|
| Nedjma Ousidhoum | Lead | Cardiff University |
| Junho Myung | Lead | KAIST |
| Carla Perez-Almendros | Lead | Cardiff University |
| Jiho Jin | Lead | KAIST |
| Yi Zhou | Lead | Cardiff University |
| Chenyang Lyu | Organiser | Alibaba |
| Meriem Beloucif | Organiser, Language Lead (Swedish) | Uppsala University |
| Amr Keleg | Organiser, Language Lead (Arabic, EG) | University of Edinburgh |
| Rodrigo Agerri | Language Lead (Basque, ES) | University of the Basque Country |
| Vladimir Araujo | Language Lead (Spanish, EC) | Sailplane AI |
| Naomi Baes | Language Lead (English, AU) | University of Melbourne |
| James Barry | Language Lead (Irish) | IBM Research |
| Joanne Boisson | Language Lead (French, FR) | Cardiff University |
| Nancy F. Chen | Language Lead (Mandarin/Malay/Tamil, SG) | A*STAR Institute for Infocomm Research, Singapore |
| Christine de Kock | Language Lead (English, AU) | University of Melbourne |
| Aleksandra Edwards | Language Lead (Bulgarian) | Cardiff University |
| Joseba Fernandez de Landa | Language Lead (Basque, ES) | University of the Basque Country |
| Mohamed Fazli Imam | Language Lead (Tamil, LK) | MBZUAI |
| Huda Hakami | Language Lead (Arabic, SA) | Taif University, Saudi Arabia |
| Shu-Kai Hsieh | Language Lead (Mandarin, TW) | National Taiwan University |
| Joseph Marvin Imperial | Language Lead (Tagalog, PH) | University of Bath |
| Roy Ka-Wei Lee | Language Lead (Mandarin/Malay/Tamil, SG) | Singapore University of Technology and Design |
| Rifki Afina Putri | Language Lead (Indonesian/Javanese, ID) | Universitas Gadjah Mada |
| Younes Samih | Language Lead (Arabic, MA) | IBM |
| Johan Sjons | Language Lead (Swedish) | Uppsala University |
| Bryan Tan | Language Lead (Mandarin/Malay/Tamil, SG) | Singapore University of Technology and Design |
| Asahi Ushio | Language Lead (Japanese) | |
| Weihua Zheng | Language Lead (Mandarin/Malay/Tamil, SG) | Singapore University of Technology and Design |
| Liu Zhengyuan | Language Lead (Mandarin/Malay/Tamil, SG) | A*STAR Institute for Infocomm Research, Singapore |
| Alice Oh | Advisory Organiser | KAIST |
| Jose Camacho-Collados | Advisory Organiser | Cardiff University |