SemEval 2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

May 7, 2026 · View on GitHub

SemEval 2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

This is an extension of Myung et al., 2024. If you use our data, please cite:

  @inproceedings{semeval2026task7,
  title     = "{S}em{E}val-2026 {T}ask 7: {E}veryday {K}nowledge {A}cross {D}iverse {L}anguages and {C}ultures",
  author    = "Nedjma Ousidhoum and Junho Myung and Carla Perez-Almendros and Jiho Jin and
               Amr Keleg and Meriem Beloucif and Yi Zhou and Rodrigo Agerri and Vladimir Araujo and
               Naomi Baes and James Barry and Joanne Boisson and Nancy F. Chen and Christine de Kock and
               Aleksandra Edwards and Joseba Fernandez de Landa and Mohamed Fazli Imam and Huda Hakami and
               Shu-Kai Hsieh and Joseph Marvin Imperial and Roy Ka-Wei Lee and Chenyang Lyu and
               Younes Samih and Johan Sjons and Bryan Tan and Asahi Ushio and Weihua Zheng and Zhengyuan Liu and
               Alice Oh and Jose Camacho-Collados",
  booktitle = "Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)",
  year      = "2026",
  publisher = "Association for Computational Linguistics"
}

Preprint available at: https://arxiv.org/abs/2605.02601.

⚠️ This is an evaluation-only task. Hence, BLEnD may NOT be used for fine-tuning or few-shot learning. However, any NLP system may be submitted and you are free to be as creative as you wish!
❗🌐 Competition website: CodaBench
💡 Discord channel: Join here to ask questions and receive updates
🔧 Questions or issues: Please create an issue
✉️ Email organisers: semeval-2026-blend-organisers[at]googlegroups[dot]com
💬 Participants’ Google Group: Request to join at semeval-2026-task7-blend-participants[at]googlegroups[dot]com

Content

📢 News
Everyday Knowledge Across Diverse Languages and Cultures
Tracks
- Track 1: Short Answer Questions (SAQ)
- Track 2: Multiple-Choice Questions (MCQ)
Evaluation
Important Dates and Task Phases
How to Participate
Competition Rules and Terms
Dataset paper
FAQs
Resources
Organizers

📢 News

6 Feb 2026 ❗❗❗ The gold labels are now available here❗❗❗

Jan 2026 The test phase will start on January 19 and run until February 2 (the dev phase has therefore been extended). Register on CodaBench

Sep 2025 The competition website is now live, register on CodaBench

Aug 2025 ⚠️ The pilot data is available here.

Everyday Knowledge Across Diverse Languages and Cultures

The global deployment of large language models (LLMs) and NLP systems requires cultural awareness. Yet, these models often lack culture-specific knowledge, particularly for under-served languages and regions. Their outputs frequently reflect Western-centric perspectives or stereotypes inherited from training data. Existing benchmarks, largely based on monolingual datasets or Wikipedia, often fail to capture the realities of everyday life across cultures.

This shared task aims to evaluate the cultural awareness of LLMs and NLP systems across multiple languages. We will use an extended version of the manually constructed BLEnD benchmark (Myung et al., 2024) as validation and test sets for several languages. BLEnD is specifically designed for evaluation and will not be used for training, ensuring that results reflect a model's ability to generalize to unseen, diverse cultural and linguistic contexts.

BLEnD currently covers 13 languages and 16 cultures. For this shared task, we will expand its scope to include 17 additional language–culture pairs.

The languages and cultures inncluded in our dataset are as follows (completed annotations from the original BLEnD are in bold):

Area	Language (Region)
Africa	Arabic (Algeria), Amharic (Ethiopia), Hausa (Northern Nigeria), Arabic (Egypt), Arabic (Morocco)
Asia	Assamese (Assam, India), Azerbaijani (Azerbaijan), Mandarin (China), Indonesian (Indonesia), Javanese (West Java, Indonesia), Persian (Iran), Korean (North Korea), Korean (South Korea), Arabic (Saudi Arabia), Japanese (Japan), Bengali (India), Tagalog (Philippines), Tamil (Sri Lanka), Tamil (Singapore), Taiwaanese Mandarin (Taiwan), Singaporean Mandarin (Singapore), Malay (Singapore)
Australia	English (AU)
Europe	Greek (Greece), Spanish (Spain), English (UK), French (France), Bulgarian (Bulgaria), Swedish (Sweden), Irish (Ireland), Basque (Basque Country)
North America	English (US)
Latin America	Spanish (Equador), Spanish (Mexico)

Tracks

Table 1. Track 1 (SAQ)

Locale	Language	Region / Country
am-ET	Amharic	Ethiopia
ar-DZ	Arabic	Algeria
ar-EG	Arabic	Egypt
ar-MA	Arabic	Morocco
ar-SA	Arabic	Saudi Arabia
as-AS	Assamese	India
az-AZ	Azerbaijani	Azerbaijan
bg-BG	Bulgarian	Bulgaria
el-GR	Greek	Greece
en-AU	English	Australia
en-GB	English	United Kingdom
en-SG	English	Singapore
en-US	English	United States
es-EC	Spanish	Ecuador
es-ES	Spanish	Spain
es-MX	Spanish	Mexico
eu-PV	Basque	Basque Country (Spain)
fa-IR	Persian (Farsi)	Iran
fr-FR	French	France
ga-IE	Irish	Ireland
ha-NG	Hausa	Nigeria
id-ID	Indonesian	Indonesia
ja-JP	Japanese	Japan
ko-KP	Korean	North Korea
ko-KR	Korean	South Korea
ms-SG	Malay	Singapore
su-JB	Sundanese	Indonesia
sv-SE	Swedish	Sweden
ta-LK	Tamil	Sri Lanka
ta-SG	Tamil	Singapore
tl-PH	Tagalog	Philippines
zh-CN	Chinese	China
zh-SG	Chinese	Singapore
zh-TW	Chinese	Taiwan
en-AS	English	Assam
en-AZ	English	Azerbaijan
en-BG	English	Bulgaria
en-CN	English	China
en-DZ	English	Algeria
en-EC	English	Ecuador
en-EG	English	Egypt
en-ES	English	Spain
en-ET	English	Ethiopia
en-FR	English	France
en-GR	English	Greece
en-ID	English	Indonesia
en-IE	English	Ireland
en-IR	English	Iran
en-JB	English	Indonesia (West Java)
en-JP	English	Japan
en-KP	English	North Korea
en-KR	English	South Korea
en-LK	English	Sri Lanka
en-MA	English	Morocco
en-MX	English	Mexico
en-NG	English	Nigeria
en-PH	English	Philippines
en-PV	English	Basque Country (Spain)
en-SA	English	Saudi Arabia
en-SE	English	Sweden
en-TW	English	Taiwan

Table 2. Track 2 (MCQ)

Locale	Region / Country
am-ET	Ethiopia
ar-DZ	Algeria
ar-EG	Egypt
ar-MA	Morocco
ar-SA	Saudi Arabia
as-AS	Assam (India)
az-AZ	Azerbaijan
bg-BG	Bulgaria
el-GR	Greece
en-AU	Australia
en-GB	United Kingdom
en-US	United States
es-EC	Ecuador
es-ES	Spain
es-MX	Mexico
eu-PV	Basque Country (Spain)
fa-IR	Iran
fr-FR	France
ga-IE	Ireland
ha-NG	Nigeria
id-ID	Indonesia
ja-JP	Japan
ko-KP	North Korea
ko-KR	South Korea
su-JB	Indonesia
sv-SE	Sweden
ta-LK	Sri Lanka
tl-PH	Philippines
zh-CN	China
zh-SG	Singapore

Track 1: Short Answer Questions (SAQ)

Participants will evaluate their models on short-answer questions (SAQs) to assess their model's ability to generate accurate responses while accounting for cultural and linguistic diversity. This track covers all languages. For each question, correctness will be determined by comparison with human-annotated reference answers from BLEnD.

Data in this track includes two variants: (1) an English variant, in which answers are in English (corresponding to locale codes in Table 1 that begin with en), and (2) a native-language version, in which answers are in the original language (corresponding to locale codes in Table 1 that begin with the original language code).

Track 2: Multiple-Choice Questions (MCQ)

In this track, questions are provided in English only for the regions listed in Table 2. Each question includes four answer options, each representing a cultural perspective from a different country or region---i.e., the option that received the highest number of votes for that country or region. To ensure fairness, questions are filtered to exclude those marked as culturally irrelevant or unclear by human annotators. Each multiple-choice question contains four answer options, with no more than one option representing any given country or region. The model is evaluated based on its ability to identify the culturally appropriate choice for each question and region.

Evaluation

We will evaluate each submission using accuracy based on the alignment of the generated answer with human annotations. Notably, our evaluation accounts for variations in responses, which ensures a more robust assessment. Specifically, in the SAQ track, a model-generated answer is marked as correct if it matches any of the responses provided by human annotators for the same question, and in the MCQ track, accuracy is calculated based on the correctness of the selected answer. More details about the evaluation protocol can be found in (Myung et al., 2024).

Important Dates and Task Phases

Task	Date
Sample data ready	8 August 2025
Evaluation start	12 January 2026
Evaluation end	2 February 2026
Paper submission due	February 2026 (to be updated when the SemEval Chairs will communicate the date)
Notification to authors	March 2026 (to be updated when the SemEval Chairs will communicate the date)
Camera ready due	April 2026 (to be updated when the SemEval Chairs will communicate the date)
SemEval workshop	July 2026 (co-located with ACL 2026)

How to Participate

Register: Sign up on the CodaBench competition platform.
Track: Decide on the track(s) you want to participate in (Track 1 and/or 2).
Download: Access to the questions for each track will be provided in this repository.
Submit: Submit your models' predictions on the CodaBench competition platform.

Detailed guidelines would be provided soon.

Competition Rules and Terms

1. Consent to Public Release of Scores

By submitting results, you consent to the public release of your scores on:
- the competition website,
- at the designated workshop,
- in associated proceedings.
Task organizers have discretion over the release and choice of metrics.
Scores may include:
- automatic and manual quantitative judgments,
- qualitative judgments,
- other metrics as deemed appropriate.

2. Score Release and Validity

Task organizers reserve the right to withhold scores for:
- incomplete submissions,
- erroneous submissions,
- deceptive submissions,
- rule-violating submissions.
Inclusion of a submission's scores does not constitute endorsement.

3. Team Participation Rules

Participants may be involved in only one team.
Exceptions may be granted with prior approval from organizers.

4. Account Management

Each team must create and use exactly one account on the designated platform.

5. Team Constitution

Team membership cannot be changed after the evaluation period begins.

6. Development Period Rules

Teams can submit up to 999 submissions.
Results are visible only to the submitting team.
Leaderboard is disabled.
Warnings and errors are visible for each submission.

7. Evaluation Period Rules

The teams are contrained to make 3 submissions.
Only the final submission will be considered official.
Warnings and errors are visible for each submission.

8. Post-Competition

The gold labels will be released after the competition.
The teams are encouraged to report results on all their system variants in their description paper.
The official submission results must be clearly indicated.

9. Public Release of Submissions

Final team submissions may be made public after the evaluation period.

10. Disclaimer about the Datasets

Organizers and affiliated institutions provide no warranties on dataset correctness or completeness.
They are not liable for dataset access or usage.

11. Peer Review Process

Each participant will review another team's system description paper.

12. Dataset Usage Restrictions

Datasets should only be used for scientific or research purposes.
Any other use is explicitly prohibited.
Datasets must not be redistributed or shared with third parties.
Interested parties should be directed to the official website.

13. Final ranking

To be included in the official task ranking, you **MUST** submit a system description paper.

Dataset Paper

The dataset paper for the initial version of the BLEnD can be found here (accepted to NeurIPS Datasets & Benchmark Track 2024).

FAQs

Do I have to participate in all languages for a given track?

No, you can participate in one or more languages.

Can I fine-tune my model on BLEnD?

**No**. We are using BLEnD for evaluation only but you can submit **any** NLP system!

How will you verify my submitted model?

To be included in the final team rankings of our shared task, it is mandatory for participants to submit a system description paper describing their approaches and methodologies in detail, therefore ensuring scientific integrity.

Can I use LLMs in the different tracks?

Yes.

Can I use additional datasets (e.g, publicly provided ones from other sources)?

Yes. Please do cite them in the system description paper.

How was the data annotated and did you use LLMs to annotate it?

No. The data was annotated by native speakers (≥5 per instance), not LLMs. Annotators answered the given cultural question based on their cultural background, without aksing the LLM. Different answer options were expected since this is a subjective task. See the task definition for more details.

Will I be included in the final ranking if I do not write a system description paper?

No. You **MUST** write a system description paper to be included in the final ranking.

I have never written a system description paper. How can I write one?

We will provide an online writing tutorial and share resources to help you write your paper.

Do I need to pay conference registration fees and/or attend SemEval for my paper to be published?

No. It is not required to attend SemEval or pay registration fees for your paper to be published. However, if you want to attend, you must pay the attendance fee.

Our system did not perform very well, should I still write a system description paper?

Yes! We want insights from all participants—even if your system did not perform well. Negative results are still valuable.

Resources

SemEval 2026 Shared Tasks
Frequently Asked Questions about SemEval
Paper Submission Requirements
How to write a task description paper?
Guidelines for Writing Papers
Paper style files
Paper submission link (to be added)
References (more to be added)

Myung, Junho, et al. "Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages." Advances in Neural Information Processing Systems 37 (2024): 78104-78146.
Pawar, Siddhesh, et al. "Survey of cultural awareness in language models: Text and beyond." Computational Linguistics (2025): 1-96.
Md. Arid Hasan, Maram Hasanain, Fatema Ahmad, Sahinur Rahman Laskar, Sunaya Upadhyay, Vrunda N Sukhadia, Mucahid Kutlu, Shammur Absar Chowdhury, and Firoj Alam. 2025. NativQA: Multilingual Culturally-Aligned Natural Query for LLMs. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14886–14909, Vienna, Austria. Association for Computational Linguistics.
Alam, Firoj, et al. "NativQA Framework: Enabling llms with native, local, and everyday knowledge." arXiv preprint arXiv:2504.05995 (2025).

Organizers

Name	Role	Affiliation
Nedjma Ousidhoum	Lead	Cardiff University
Junho Myung	Lead	KAIST
Carla Perez-Almendros	Lead	Cardiff University
Jiho Jin	Lead	KAIST
Yi Zhou	Lead	Cardiff University
Chenyang Lyu	Organiser	Alibaba
Meriem Beloucif	Organiser, Language Lead (Swedish)	Uppsala University
Amr Keleg	Organiser, Language Lead (Arabic, EG)	University of Edinburgh
Rodrigo Agerri	Language Lead (Basque, ES)	University of the Basque Country
Vladimir Araujo	Language Lead (Spanish, EC)	Sailplane AI
Naomi Baes	Language Lead (English, AU)	University of Melbourne
James Barry	Language Lead (Irish)	IBM Research
Joanne Boisson	Language Lead (French, FR)	Cardiff University
Nancy F. Chen	Language Lead (Mandarin/Malay/Tamil, SG)	A*STAR Institute for Infocomm Research, Singapore
Christine de Kock	Language Lead (English, AU)	University of Melbourne
Aleksandra Edwards	Language Lead (Bulgarian)	Cardiff University
Joseba Fernandez de Landa	Language Lead (Basque, ES)	University of the Basque Country
Mohamed Fazli Imam	Language Lead (Tamil, LK)	MBZUAI
Huda Hakami	Language Lead (Arabic, SA)	Taif University, Saudi Arabia
Shu-Kai Hsieh	Language Lead (Mandarin, TW)	National Taiwan University
Joseph Marvin Imperial	Language Lead (Tagalog, PH)	University of Bath
Roy Ka-Wei Lee	Language Lead (Mandarin/Malay/Tamil, SG)	Singapore University of Technology and Design
Rifki Afina Putri	Language Lead (Indonesian/Javanese, ID)	Universitas Gadjah Mada
Younes Samih	Language Lead (Arabic, MA)	IBM
Johan Sjons	Language Lead (Swedish)	Uppsala University
Bryan Tan	Language Lead (Mandarin/Malay/Tamil, SG)	Singapore University of Technology and Design
Asahi Ushio	Language Lead (Japanese)	Google
Weihua Zheng	Language Lead (Mandarin/Malay/Tamil, SG)	Singapore University of Technology and Design
Liu Zhengyuan	Language Lead (Mandarin/Malay/Tamil, SG)	A*STAR Institute for Infocomm Research, Singapore
Alice Oh	Advisory Organiser	KAIST
Jose Camacho-Collados	Advisory Organiser	Cardiff University