EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts

December 30, 2022 ยท View on GitHub

This is the implementation of our paper "EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts". This work has been accepted at the AACL-IJCNLP 2022. You can find the paper here.

Abstract

For low-resourced Bangla language, works on detecting emotions on textual data suffer from size and cross-domain adaptability. In our paper, we propose a manually annotated dataset of 22,698 Bangla public comments from social media sites covering 12 different domains such as Personal, Politics, and Health, labeled for 6 fine-grained emotion categories of the Junto Emotion Wheel. We invest efforts in the data preparation to 1) preserve the linguistic richness and 2) challenge any classification model. Our experiments to develop a benchmark classification system show that random baselines perform better than neural networks and pre-trained language models as hand-crafted features provide superior performance.

Authors

  • Khondoker Ittehadul Islam 1
  • Tanvir Hossain Yuvraz 1
  • Md Saiful Islam 1,2
  • Enamul Hassan 1

1 Shahjalal University of Science and Technology, Bangladesh

2 University of Alberta, Canada

EmoNoBa Dataset is available here

List of files

  • Train.csv
  • Val.csv
  • Test.csv

Files Format

Column TitleDescription
DataSocial media comment
Love0, 1. '1' for Love, '0' for Not Love
Joy0, 1. '1' for Joy, '0' for Not Joy
Surprise0, 1. '1' for Surprise, '0' for Not Surprise
Anger0, 1. '1' for Anger, '0' for Not Anger
Sadness0, 1. '1' for Sadness, '0' for Not Sadness
Fear0, 1. '1' for Fear, '0' for Not Fear
TopicTopic of the comment
DomainSource of the comment from {Youtube, Facebook and Twitter}

INSTALLATION

Requires the following packages:

  • Python 3.10.7 or higher

It is recommended to use virtual environment packages such as virtualenv. Follow the steps below to setup the project:

  • Clone this repository via git clone https://github.com/KhondokerIslam/EmoNoBa.git
  • Use this command to install required packages pip install -r requirements.txt
  • Run the setup.sh file to download additional data and setup pre-processing

Usage

  1. Download the EmoNoBa dataset from here
  2. Unzip the folder
  3. Ensure the folder name is "EmoNoBa Dataset"
  4. Go to data_processing folder and run python preprocess.py to obtain the preprocessed data.

Feature-Based Experiments

  • Go to Models folder
  • Use python feature_based.py
  • Type in the model name when you will be asked to specify the model name in the console
  • Model Names (Please follow the paper to read the details about experiments):
    • W1
    • W2
    • W3
    • W4
    • W1+W2
    • W1+W2+W3
    • W1+W2+W3+W4
    • C2
    • C3
    • C4
    • C5
    • C1+C2+C3
    • C1+C2+C3+C4
    • C1+C2+C3+C4+C5
    • W1+C1+C2+C3+C4+C5
    • W1+W2+W3+C1+C2+C3
    • W1+W2+W3+W4+C1+C2+C3

Neural Network Experiments

Random Initialize
  • Go to Models folder
  • Use "python neural_network_(random).py" to run an experiment.
FastText
  • Go to Models folder
  • Use "python neural_network_(embedding).py" to run an experiment.

Bangla-BERT

  • Go to Models folder
  • Use "python bangla-bert.py" to run an experiment.

Bibtex

@inproceedings{islam2022emonoba,
  title={EmoNoBa: A Dataset for Analyzing Fine-Grained Emotions on Noisy Bangla Texts},
  author={Islam, Khondoker Ittehadul and Yuvraz, Tanvir and Islam, Md Saiful and Hassan, Enamul},
  booktitle={Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing},
  pages={128--134},
  year={2022}
}