Analisis Sentimen Twitter dengan TFIDF-ANN
January 7, 2019 ยท View on GitHub
Sentimen analisis twit dari Twitter untuk menentukan apakah sebuah twitter dianggap berbau POSITIF atau NEGATIF. Sentimen analisis ini menggunakan Multilayer Perceptron dengan ekstraksi fitur TF-IDF (Term Frequency and Inverse Document Frequency)
Installation
scikit-learn, nltk, numpy, keras (dan backendnya), csv
Usage
# analiser.py
# main class: Analiser(training_data)
# training_data default value = 'data/training_all_random.csv'
#
# for main class example see main_*.py file, try run the file
#
# --
# main_pre_trained.py | load existing model
# analiser load training_data as base train data, load existing model
an = Analiser(training_data='data/training_all_random.csv')
an.load_model(filename='model')
# --
# main_training.py | train new model
# analiser load training_data as base train data, train the data, save the model
an = Analiser(training_data='data/training_all_random.csv')
an.train(filename='model')
# --
# main_re_training.py | retrain existing model
# analiser load training_data as base train data, load existing model, train the data, save the model
an = Analiser(training_data='data/training_all_random.csv')
an.retrain(filename='model')
Testing
Pada data diatas, dataset yang digunakan adalah twit mengenai pilkada DKI Jakarta kemarin.
# let analiser_instance is an instance of Analiser
test = "ahok itu pemimpin yang beres memimpin"
print test
print analiser_instance.testFromTrained([analiser_instance.tfidf_data.transform(test)])
# output: POSITIF
test = "ahok itu pemimpin yang ga beres memimpin"
print test
print analiser_instance.testFromTrained([analiser_instance.tfidf_data.transform(test)])
# output: NEGATIF
Train / Re-train Network Forward/Backprop Parameters
# analiser.py
# Change parameter for training
def train(self, output_filename = 'model'):
...
learning_rate = .01
loss_error = 'binary_crossentropy'
batch_size = 1
epoch = 10
...
# analiser.py
# Change parameter for retraining
def retrain(self, output_filename):
...
learning_rate = .005
loss_error = 'binary_crossentropy'
batch_size = 1
epoch = 3
...
Contributing
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin my-new-feature - Submit a pull request :D
Credits
Dataset, koleksi barang-barang bahasa Indonesia, dan beberapa bagian preproses
https://github.com/ramaprakoso/analisis-sentimen
TF-IDF inspiration