Content

April 16, 2026 · View on GitHub

You can find the datasets here.

Track A: DimABSA

No.	Language	Code	Subtask 1 DimASR	Subtask 2 DimASTE	Subtask 3 DimASQP	Dataset Release
1	English	eng	Restaurant Laptop	Restaurant Laptop	Restaurant Laptop	✅ Released
2	Japanese	jpn	Hotel Finance	Hotel	Hotel	✅ Released
3	Russian	rus	Restaurant	Restaurant	Restaurant	✅ Released
4	Tatar	tat	Restaurant	Restaurant	Restaurant	✅ Released
5	Ukrainian	ukr	Restaurant	Restaurant	Restaurant	✅ Released
6	Chinese	zho	Restaurant Laptop Finance	Restaurant Laptop	Restaurant Laptop	✅ Released

Track B: DimStance

No.	Language	Code	Subtask 1 DimASR	Dataset Release
1	English	eng	Environmental Protection	✅ Released
2	German	deu	Politics	✅ Released
3	Chinese	zho	Environmental Protection	✅ Released
4	Nigerian-Pidgin	pcm	Politics	✅ Released
5	Swahili	swa	Politics	✅ Released

Evaluation

The performance of the submitted systems will be evaluated based on the following metrics. You can find the evaluation script here.

Subtask 1: DimASR (RMSE)

DimASR is a sentiment regression task evaluated using Root Mean Square Error (RMSE) between the predicted and gold VA values:

RMSE_{VA} = \sqrt{\sum_{i=1}^N \frac{(V_p^{(i)} - V_g^{(i)})^2 + (A_p^{(i)} - A_g^{(i)})^2}{N} }

where $N$ is the total number of instances; ${V_p^{(i)}}$ and ${A_p^{(i)}}$ denote the predicted valence and arousal values for instance $i$ ; and ${V_g^{(i)}}$ and ${A_g^{(i)}}$ denote the corresponding gold values.

Notes: VA outputs must be within [1, 9], rounded to two decimals.

Subtask 2 & 3: DimASTE & DimASQP (continuous F1)

DimASTE and DimASQP are sentiment analysis tasks involving extraction, classification, and regression. Their outputs contain both categorical elements (e.g., A, C, O) and continuous elements (VA), which have traditionally been evaluated using separate metrics. In conventional ABSA tasks, categorical elements are assessed using precision, recall, and F1-score, where a predicted tuple is counted as a true positive (TP) only if all its categorical elements exactly match the gold annotation. This binary criterion, however, does not account for continuous-valued components, which are typically evaluated using correlation-based or difference-based metrics. To unify the evaluation of categorical and continuous components, we propose the continuous true positive (cTP), which extends the categorical TP by incorporating a penalty based on the VA prediction error. Let P be the set of predicted triplets (A, O, VA) or quadruplets (A, C, O, VA). For a prediction $t \in P$ , its cTP is defined as

cTP^{(t)} = \begin{cases} 1 - \text{dist}(VA_p^{(t)}, VA_g^{(t)}), & t \in P_{cat} \\ 0, & \text{otherwise} \end{cases}

where $P_{cat} \subseteq P$ denotes the set of predictions in which all categorical elements, (A, O) for a triplet or (A, C, O) for a quadruplet, exactly match the gold annotation for the same sentence. Each categorically correct prediction $t \in P_{cat}$ is assigned an initial TP score of 1, which is then reduced based on its VA error distance. The distance function is defined as

dist(VA_p, VA_g) = \frac{\sqrt{\left( V_p - V_g \right)^2 + \left( A_p - A_g \right)^2}}{D_{max}},

where $dist(\cdot)$ denotes the normalized Euclidean distance between the predicted $VA_p = (V_p, A_p)$ and gold $VA_g = (V_g, A_g)$ in the VA space, and $D_{max}=\sqrt{$ 8^{2} $+ 8^2}=\sqrt{128}$ is the maximum possible Euclidean distance in the VA space on the [1, 9] scale, ensuring that $dist$ ⊆ [0, 1].

Building on per-prediction $cTP^{(t)}$ , continuous Recall (cRecall) is defined as the total cTP divided by the number of gold triplets/quadruplets:

cRecall = \frac{{T{P_{cat}} - \sum\nolimits_{t \in {P_{cat}}} {{\rm{dist(}}VA_p^{(t)}{\rm{, }}VA_g^{(t)}{\rm{)}}} }}{{T{P_{cat}} + F{N_{cat}}}},

where the numerator represents the total cTP, computed as the number of categorically correct predictions $TP_{cat} = \lvert P_{cat} \rvert$ minus the sum of their VA error distances, and $FN_{cat}$ denotes the number of gold triplets/quadruplets with no categorical match.

Similarly, the continuous Precision (cPrecision) is defined as the total cTP divided by the number of predictions.

cPrecision = \frac{{T{P_{cat}} - \sum\nolimits_{t \in {P_{cat}}} {{\rm{dist(}}VA_p^{(t)}{\rm{, }}VA_g^{(t)}{\rm{)}}} }}{{T{P_{cat}} + F{P_{cat}}}},

where $FP_{cat}$ denotes the number of predictions with no categorical match. Figure 2 illustrates an example of calculating cTP, cRecall, and cPrecision.

Finally, the continuous F1 (cF1) is the harmonic mean of cRecall and cPrecision.

cF{\rm{1}} = \frac{{2 \times cRecall \times cPrecision}}{{cRecall + cPrecision}}

Fig. 2. Example of calculating cTP, cRecall, and cPrecision.

Notes:

When the VA prediction is perfect (dist=0), cRecall/cPrecision reduces to the standard recall/precision.
VA outputs must be within [1, 9], rounded to two decimals. Any prediction with either V or A outside this range is considered invalid.
Participants should remove duplicate predictions before submission. If multiple predictions in the same sentence share the same categorical tuple (A,O) for triplets or (A,C,O) for quadruplets, all of them are considered invalid.

Starter Kit

We provide a starter kit to help participants get started with the competition and reproduce a simple baseline system. The baseline scripts demonstrate the required input–output format and submission procedure on Codabench, ensuring that participants clearly understand the submission pipeline before developing their own models.

You can use the provided examples as a reference and then extend or replace them with your own approaches for the final competition submissions.

Task 1 – DimASR:
Starter Kit for Task 1
Tasks 2 & 3 – DimASTE and DimASQP:
Starter Kit for Tasks 2 & 3

Important Dates and Task Phases

Description	Deadline
Sample Data Ready	~~15 July 2025~~
Training Data Ready	30 September 2025
Evaluation Start	20 January 2026
Evaluation End	30 January 2026
System Description Paper Due	2 March 2026
Notification to Authors	9 April 2026
Camera Ready Due	30 April 2026
SemEval Workshop 2026	SemEval workshop July (co-located with ACL 2026)

All deadlines are 23:59 UTC-12 ("anywhere on Earth").

How to Participate

Register: Sign up on the CodaBench competition platform.
Track: Decide on the track(s) you want to participate in (Track A, and/or B).
Download: Access to the datasets for each track will be provided in this repository.
Develop: Build your models using the provided data.
Submit: Submit your predictions on the CodaBench competition platform.

Please follow the guidelines shared here.

Dataset paper

We will soon release a dataset paper that describes the data collection, annotation process, and baseline experiments. This paper will provide additional details and information that will be useful for the task participants.

Competition Rules and Terms

1. Consent to Public Release of Scores

By submitting results, you consent to the public release of your scores on:
- the competition website,
- at the designated workshop,
- in associated proceedings.
Task organizers have discretion over the release and choice of metrics.
Scores may include:
- automatic and manual quantitative judgments,
- qualitative judgments,
- other metrics as deemed appropriate.

2. Score Release and Validity

Task organizers reserve the right to withhold scores for:
- incomplete submissions,
- erroneous submissions,
- deceptive submissions,
- rule-violating submissions.
Inclusion of a submission's scores does not constitute endorsement.

3. Team Participation Rules

Participants may be involved in only one team.
Exceptions may be granted with prior approval from organizers.

4. Account Management

Each team must create and use exactly one account on the designated platform.

5. Team Constitution

Team membership cannot be changed after the evaluation period begins.

6. Development Period Rules

Teams can submit up to 999 submissions.
Results are visible only to the submitting team.
Leaderboard is disabled.
Warnings and errors are visible for each submission.

7. Evaluation Period Rules

Teams are limited to 3 submissions.
Only the last submission is considered official.
Warnings and errors are visible for each submission.

8. Post-Competition

Gold labels will be released after the competition.
Teams are encouraged to report results on all system variants in their description paper.
Official submission results must be clearly indicated.

9. Public Release of Submissions

Final team submissions may be made public after the evaluation period.

10. Disclaimer about the Datasets

Organizers and affiliated institutions provide no warranties on dataset correctness or completeness.
They are not liable for dataset access or usage.

11. Peer Review Process

Each participant will review another team's system description paper.

12. Dataset Usage Restrictions

Datasets should only be used for scientific or research purposes.
Any other use is explicitly prohibited.
Datasets must not be redistributed or shared with third parties.
Interested parties should be directed to the official website.

13. Final ranking

To be included in the official task ranking, you **MUST** submit a system description paper.

Communication

Join our Discord Channel to ask questions and receive updates (coming soon).
If you have any questions or issues, please feel free to create an issue.
Contact organizers at: dimabsa-organizers[at]googlegroups[dot]com

dimabsa-organizers@googlegroups.com

Full List of Aspect Categories

from SemEval-2016 Task 5

Restaurant

Entity Labels
RESTAURANT, FOOD, DRINKS, AMBIENCE, SERVICE, LOCATION
Attribute Labels
GENERAL, PRICES, QUALITY, STYLE_OPTIONS, MISCELLANEOUS

Laptop

Entity Labels
LAPTOP, DISPLAY, KEYBOARD, MOUSE, MOTHERBOARD, CPU, FANS_COOLING, PORTS, MEMORY, POWER_SUPPLY, OPTICAL_DRIVES, BATTERY, GRAPHICS, HARD_DISK, MULTIMEDIA_DEVICES, HARDWARE, SOFTWARE, OS, WARRANTY, SHIPPING, SUPPORT, COMPANY
Attribute Labels
GENERAL, PRICE, QUALITY, DESIGN_FEATURES, OPERATION_PERFORMANCE, USABILITY, PORTABILITY, CONNECTIVITY, MISCELLANEOUS

Hotel

Entity Labels
HOTEL, ROOMS, FACILITIES, ROOM_AMENITIES, SERVICE, LOCATION, FOOD_DRINKS
Attribute Labels
GENERAL, PRICE, COMFORT, CLEANLINESS, QUALITY, DESIGN_FEATURES, STYLE_OPTIONS, MISCELLANEOUS

Resources

Writting System paper
Writing tutorial: Blogpost
SemEval 2025 Shared Tasks
Frequently Asked Questions about SemEval
Paper Submission Requirements
Guidelines for Writing Papers
Paper style files
Previous shared tasks on sentiment regression

Dimensional Sentiment Corpora

Resources for Beginners

References

Sven Buechel and Udo Hahn. 2017. EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis. In Proc. of EACL-17, pages 578-585.

Hongjie Cai, Rui Xia and Jie Yu. 2021. Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions. In Findings of EMNLP-21, pages 2909–2920.

Francesca M. M. Citron, Mollie Lee, and Nora Michaelis. 2020. Affective and psycholinguistic norms for German conceptual metaphors (COMETA). Behavior Research Methods, 52(3):1056-1072.

Lung-Hao Lee, Jian-Hong Li, and Liang-Chih Yu. 2022. Chinese EmoBank: Building Valence-Arousal Resources for Dimensional Sentiment Analysis. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(4):65.

Lung-Hao Lee, Liang-Chih Yu, Suge Wang and Jian Liao. Overview of the SIGHAN 2024 shared task for Chinese dimensional aspect-based sentiment analysis. In Proc. of SIGHAN-24, pages 165-174.

Saif M. Mohammad and Felipe Bravo-Marquez. 2017. WASSA-2017 Shared Task on Emotion Intensity. In Proc. of WASSA-17, pages 34-49.

Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. SemEval-2018 Task 1: Affect in Tweets. In Proc. of SemEval-18, pages 1-17.

Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. 2017. Stance and Sentiment in Tweets. ACM Transactions on Internet Technology, 17(3):26.

Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Seid Muhie Yimam, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine De Kock, Tadesse Destaw Belay, Ibrahim Said Ahmad, Nirmal Surange, Daniela Teodorescu, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino Ali, Vladimir Araujo, Abinew Ali Ayele, Oana Ignat, Alexander Panchenko, Yi Zhou, and Saif M. Mohammad. 2025. SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. In Proc. of SemEval-25, pages 2558-2569.

Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, and Luo Si. 2020. Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis. In Proc. of AAAI-20, pages 8600-8607.

Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud AI-Ayyoub, Yanyan Zhao, Bing Qin, Orphee De Clercq, Veronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch. Evgeny Kotelnikov, Nuria Bel, Salud Maria Jimenez-Zafra and Gulsen Eryigit. 2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. In Proc. of SemEval-16, pages 19-30.

Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Suresh Manandhar and Ion Androutsopoulos. SemEval-2015 Task 12: Aspect Based Sentiment Analysis. In Proc. of SemEval-15, pages 486-495.

Maria Pontiki, Dimitrios Galanis, John Pavlopoulos, Haris Papageorgiou, Ion Androutsopoulos and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proc. of SemEval-14, pages 27-35.

Daniel Preot¸iuc-Pietro, H Andrew Schwartz, Gregory Park, Johannes Eichstaedt, Margaret Kern, Lyle Ungar, and Elisabeth Shulman. 2016. In Proc. of WASSA-16, pages 9-15.

James A Russel. 1980. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161-1178.

James A Russel. 2003. Core affect and the psychological construction of emotion. Psychological Review, 110(1):145172.

Liang-Chih Yu, Lung-Hao Lee, Shuai Hao, Jin Wang, Yunchao He, Jun Hu, K Robert Lai, Xuejie Zhang. 2016. Building Chinese Affective Resources in Valence-Arousal Dimensions. In Proc. of NAACL-16, pages 540-545.

Chen Zhang, Qiuchi Li, Dawei Song and Linqi Song. 2021. Aspect Sentiment Quad Prediction as Paraphrase Generation. In Proc. of EMNLP-21, pages 9209–9219.

Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing and Wai Lam. 2023. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowledge and Data Engineering, 35(11):11019-11038.

Organizers

Liang-Chih Yu, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Jonas Becker, Lung-Hao Lee, Jin Wang, Jan Philip Wahle, Terry Ruas, Alexander Panchenko, Kai-Wei Chang, Saif M. Mohammad