Content
April 16, 2026 · View on GitHub
You can find the datasets here.
Track A: DimABSA
| No. | Language | Code | Subtask 1 DimASR | Subtask 2 DimASTE | Subtask 3 DimASQP | Dataset Release |
|---|---|---|---|---|---|---|
| 1 | English | eng | Restaurant Laptop | Restaurant Laptop | Restaurant Laptop | ✅ Released |
| 2 | Japanese | jpn | Hotel Finance | Hotel | Hotel | ✅ Released |
| 3 | Russian | rus | Restaurant | Restaurant | Restaurant | ✅ Released |
| 4 | Tatar | tat | Restaurant | Restaurant | Restaurant | ✅ Released |
| 5 | Ukrainian | ukr | Restaurant | Restaurant | Restaurant | ✅ Released |
| 6 | Chinese | zho | Restaurant Laptop Finance | Restaurant Laptop | Restaurant Laptop | ✅ Released |
Track B: DimStance
| No. | Language | Code | Subtask 1 DimASR | Dataset Release |
|---|---|---|---|---|
| 1 | English | eng | Environmental Protection | ✅ Released |
| 2 | German | deu | Politics | ✅ Released |
| 3 | Chinese | zho | Environmental Protection | ✅ Released |
| 4 | Nigerian-Pidgin | pcm | Politics | ✅ Released |
| 5 | Swahili | swa | Politics | ✅ Released |
Evaluation
The performance of the submitted systems will be evaluated based on the following metrics. You can find the evaluation script here.
Subtask 1: DimASR (RMSE)
DimASR is a sentiment regression task evaluated using Root Mean Square Error (RMSE) between the predicted and gold VA values:
where is the total number of instances; and denote the predicted valence and arousal values for instance ; and and denote the corresponding gold values.
Notes: VA outputs must be within [1, 9], rounded to two decimals.
Subtask 2 & 3: DimASTE & DimASQP (continuous F1)
DimASTE and DimASQP are sentiment analysis tasks involving extraction, classification, and regression. Their outputs contain both categorical elements (e.g., A, C, O) and continuous elements (VA), which have traditionally been evaluated using separate metrics. In conventional ABSA tasks, categorical elements are assessed using precision, recall, and F1-score, where a predicted tuple is counted as a true positive (TP) only if all its categorical elements exactly match the gold annotation. This binary criterion, however, does not account for continuous-valued components, which are typically evaluated using correlation-based or difference-based metrics. To unify the evaluation of categorical and continuous components, we propose the continuous true positive (cTP), which extends the categorical TP by incorporating a penalty based on the VA prediction error. Let P be the set of predicted triplets (A, O, VA) or quadruplets (A, C, O, VA). For a prediction , its cTP is defined as
where denotes the set of predictions in which all categorical elements, (A, O) for a triplet or (A, C, O) for a quadruplet, exactly match the gold annotation for the same sentence. Each categorically correct prediction is assigned an initial TP score of 1, which is then reduced based on its VA error distance. The distance function is defined as
where denotes the normalized Euclidean distance between the predicted and gold in the VA space, and D_{max}=\sqrt{8^{2} + 8^2}=\sqrt{128} is the maximum possible Euclidean distance in the VA space on the [1, 9] scale, ensuring that ⊆ [0, 1].
Building on per-prediction , continuous Recall (cRecall) is defined as the total cTP divided by the number of gold triplets/quadruplets:
where the numerator represents the total cTP, computed as the number of categorically correct predictions minus the sum of their VA error distances, and denotes the number of gold triplets/quadruplets with no categorical match.
Similarly, the continuous Precision (cPrecision) is defined as the total cTP divided by the number of predictions.
where denotes the number of predictions with no categorical match. Figure 2 illustrates an example of calculating cTP, cRecall, and cPrecision.
Finally, the continuous F1 (cF1) is the harmonic mean of cRecall and cPrecision.

Fig. 2. Example of calculating cTP, cRecall, and cPrecision.
Notes:
- When the VA prediction is perfect (dist=0), cRecall/cPrecision reduces to the standard recall/precision.
- VA outputs must be within [1, 9], rounded to two decimals. Any prediction with either V or A outside this range is considered invalid.
- Participants should remove duplicate predictions before submission. If multiple predictions in the same sentence share the same categorical tuple (A,O) for triplets or (A,C,O) for quadruplets, all of them are considered invalid.
Starter Kit
We provide a starter kit to help participants get started with the competition and reproduce a simple baseline system. The baseline scripts demonstrate the required input–output format and submission procedure on Codabench, ensuring that participants clearly understand the submission pipeline before developing their own models.
You can use the provided examples as a reference and then extend or replace them with your own approaches for the final competition submissions.
-
Task 1 – DimASR:
Starter Kit for Task 1 -
Tasks 2 & 3 – DimASTE and DimASQP:
Starter Kit for Tasks 2 & 3
Important Dates and Task Phases
| Description | Deadline |
|---|---|
| Sample Data Ready | |
| Training Data Ready | 30 September 2025 |
| Evaluation Start | 20 January 2026 |
| Evaluation End | 30 January 2026 |
| System Description Paper Due | 2 March 2026 |
| Notification to Authors | 9 April 2026 |
| Camera Ready Due | 30 April 2026 |
| SemEval Workshop 2026 | SemEval workshop July (co-located with ACL 2026) |
All deadlines are 23:59 UTC-12 ("anywhere on Earth").
How to Participate
- Register: Sign up on the CodaBench competition platform.
- Track: Decide on the track(s) you want to participate in (Track A, and/or B).
- Download: Access to the datasets for each track will be provided in this repository.
- Develop: Build your models using the provided data.
- Submit: Submit your predictions on the CodaBench competition platform.
Please follow the guidelines shared here.
Dataset paper
We will soon release a dataset paper that describes the data collection, annotation process, and baseline experiments. This paper will provide additional details and information that will be useful for the task participants.
Competition Rules and Terms
1. Consent to Public Release of Scores
- By submitting results, you consent to the public release of your scores on:
- the competition website,
- at the designated workshop,
- in associated proceedings.
- Task organizers have discretion over the release and choice of metrics.
- Scores may include:
- automatic and manual quantitative judgments,
- qualitative judgments,
- other metrics as deemed appropriate.
2. Score Release and Validity
- Task organizers reserve the right to withhold scores for:
- incomplete submissions,
- erroneous submissions,
- deceptive submissions,
- rule-violating submissions.
- Inclusion of a submission's scores does not constitute endorsement.
3. Team Participation Rules
- Participants may be involved in only one team.
- Exceptions may be granted with prior approval from organizers.
4. Account Management
- Each team must create and use exactly one account on the designated platform.
5. Team Constitution
- Team membership cannot be changed after the evaluation period begins.
6. Development Period Rules
- Teams can submit up to 999 submissions.
- Results are visible only to the submitting team.
- Leaderboard is disabled.
- Warnings and errors are visible for each submission.
7. Evaluation Period Rules
- Teams are limited to 3 submissions.
- Only the last submission is considered official.
- Warnings and errors are visible for each submission.
8. Post-Competition
- Gold labels will be released after the competition.
- Teams are encouraged to report results on all system variants in their description paper.
- Official submission results must be clearly indicated.
9. Public Release of Submissions
- Final team submissions may be made public after the evaluation period.
10. Disclaimer about the Datasets
- Organizers and affiliated institutions provide no warranties on dataset correctness or completeness.
- They are not liable for dataset access or usage.
11. Peer Review Process
- Each participant will review another team's system description paper.
12. Dataset Usage Restrictions
- Datasets should only be used for scientific or research purposes.
- Any other use is explicitly prohibited.
- Datasets must not be redistributed or shared with third parties.
- Interested parties should be directed to the official website.
13. Final ranking
- To be included in the official task ranking, you **MUST** submit a system description paper.
Communication
- Join our Discord Channel to ask questions and receive updates (coming soon).
- If you have any questions or issues, please feel free to create an issue.
- Contact organizers at: dimabsa-organizers[at]googlegroups[dot]com
dimabsa-organizers@googlegroups.com
Full List of Aspect Categories
from SemEval-2016 Task 5
Restaurant
| Entity Labels |
|---|
| RESTAURANT, FOOD, DRINKS, AMBIENCE, SERVICE, LOCATION |
| Attribute Labels |
| GENERAL, PRICES, QUALITY, STYLE_OPTIONS, MISCELLANEOUS |
Laptop
| Entity Labels |
|---|
| LAPTOP, DISPLAY, KEYBOARD, MOUSE, MOTHERBOARD, CPU, FANS_COOLING, PORTS, MEMORY, POWER_SUPPLY, OPTICAL_DRIVES, BATTERY, GRAPHICS, HARD_DISK, MULTIMEDIA_DEVICES, HARDWARE, SOFTWARE, OS, WARRANTY, SHIPPING, SUPPORT, COMPANY |
| Attribute Labels |
| GENERAL, PRICE, QUALITY, DESIGN_FEATURES, OPERATION_PERFORMANCE, USABILITY, PORTABILITY, CONNECTIVITY, MISCELLANEOUS |
Hotel
| Entity Labels |
|---|
| HOTEL, ROOMS, FACILITIES, ROOM_AMENITIES, SERVICE, LOCATION, FOOD_DRINKS |
| Attribute Labels |
| GENERAL, PRICE, COMFORT, CLEANLINESS, QUALITY, DESIGN_FEATURES, STYLE_OPTIONS, MISCELLANEOUS |
Resources
Resources
-
Writting System paper
-
Writing tutorial: Blogpost
-
Previous shared tasks on sentiment regression
- SemEval-2025 Task 11 (Track B)
- SIGHAN-2024 Shared Tasks (Chinese)
- SemEval-2018 Task 1 (EI-reg, V-reg)
- WASSA-2017 Shared Task
- Dimensional Sentiment Corpora
- Resources for Beginners
References
Sven Buechel and Udo Hahn. 2017. EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis. In Proc. of EACL-17, pages 578-585.
Hongjie Cai, Rui Xia and Jie Yu. 2021. Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions. In Findings of EMNLP-21, pages 2909–2920.
Francesca M. M. Citron, Mollie Lee, and Nora Michaelis. 2020. Affective and psycholinguistic norms for German conceptual metaphors (COMETA). Behavior Research Methods, 52(3):1056-1072.
Lung-Hao Lee, Jian-Hong Li, and Liang-Chih Yu. 2022. Chinese EmoBank: Building Valence-Arousal Resources for Dimensional Sentiment Analysis. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(4):65.
Lung-Hao Lee, Liang-Chih Yu, Suge Wang and Jian Liao. Overview of the SIGHAN 2024 shared task for Chinese dimensional aspect-based sentiment analysis. In Proc. of SIGHAN-24, pages 165-174.
Saif M. Mohammad and Felipe Bravo-Marquez. 2017. WASSA-2017 Shared Task on Emotion Intensity. In Proc. of WASSA-17, pages 34-49.
Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. 2018. SemEval-2018 Task 1: Affect in Tweets. In Proc. of SemEval-18, pages 1-17.
Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. 2017. Stance and Sentiment in Tweets. ACM Transactions on Internet Technology, 17(3):26.
Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, Seid Muhie Yimam, Jan Philip Wahle, Terry Ruas, Meriem Beloucif, Christine De Kock, Tadesse Destaw Belay, Ibrahim Said Ahmad, Nirmal Surange, Daniela Teodorescu, David Ifeoluwa Adelani, Alham Fikri Aji, Felermino Ali, Vladimir Araujo, Abinew Ali Ayele, Oana Ignat, Alexander Panchenko, Yi Zhou, and Saif M. Mohammad. 2025. SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. In Proc. of SemEval-25, pages 2558-2569.
Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, and Luo Si. 2020. Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis. In Proc. of AAAI-20, pages 8600-8607.
Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud AI-Ayyoub, Yanyan Zhao, Bing Qin, Orphee De Clercq, Veronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch. Evgeny Kotelnikov, Nuria Bel, Salud Maria Jimenez-Zafra and Gulsen Eryigit. 2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. In Proc. of SemEval-16, pages 19-30.
Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Suresh Manandhar and Ion Androutsopoulos. SemEval-2015 Task 12: Aspect Based Sentiment Analysis. In Proc. of SemEval-15, pages 486-495.
Maria Pontiki, Dimitrios Galanis, John Pavlopoulos, Haris Papageorgiou, Ion Androutsopoulos and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proc. of SemEval-14, pages 27-35.
Daniel Preot¸iuc-Pietro, H Andrew Schwartz, Gregory Park, Johannes Eichstaedt, Margaret Kern, Lyle Ungar, and Elisabeth Shulman. 2016. In Proc. of WASSA-16, pages 9-15.
James A Russel. 1980. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161-1178.
James A Russel. 2003. Core affect and the psychological construction of emotion. Psychological Review, 110(1):145172.
Liang-Chih Yu, Lung-Hao Lee, Shuai Hao, Jin Wang, Yunchao He, Jun Hu, K Robert Lai, Xuejie Zhang. 2016. Building Chinese Affective Resources in Valence-Arousal Dimensions. In Proc. of NAACL-16, pages 540-545.
Chen Zhang, Qiuchi Li, Dawei Song and Linqi Song. 2021. Aspect Sentiment Quad Prediction as Paraphrase Generation. In Proc. of EMNLP-21, pages 9209–9219.
Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing and Wai Lam. 2023. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowledge and Data Engineering, 35(11):11019-11038.
Organizers
Liang-Chih Yu, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Jonas Becker, Lung-Hao Lee, Jin Wang, Jan Philip Wahle, Terry Ruas, Alexander Panchenko, Kai-Wei Chang, Saif M. Mohammad