final_project

July 3, 2019 ยท View on GitHub

Aspect and opinion terms extraction for hotel's review from AiryRooms in Bahasa Indonesia

Corpus description

The corpus is located in the folder data/reviews. The corpus consists of 5000 reviews (78.604 tokens) that are divided into train.txt (4000 reviews) and test.txt (1000 reviews). Here's the label distribution for the corpus.

Labeltrain.txttest.txt
B-ASPECT70051758
I-ASPECT2292584
B-SENTIMENT96462384
I-SENTIMENT42651067
OTHER398979706
Total6310515499

reviews.txt contains raw reviews and reviews_preprocessed.txt contains reviews that have been preprocessed that are used to train word embedding.