Summary

September 9, 2025 · View on GitHub

Turkish Tourism is a domain specific treebank consisting of 19,750 manually annotated sentences and 92,200 tokens. These sentences were taken from the original customer reviews of a tourism company.

Introduction

Turkish Tourism is the first domain specific treebank of Turkish. It consists of 19,750 manually annotated sentences and and 92,200 tokens. The corpus consists of hotel/restaurant reviews of a booking company. The data is split into half by test and training files.

Acknowledgments

We wish to thank the Starlang Software for funding and supporting this work.

Changelog

  • 2021-05-15 v2.8
    • Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================
Data available since: UD v2.8
License: CC BY-SA 4.0
Includes text: yes
Parallel: no
Genre: reviews
Lemmas: converted from manual
UPOS: converted from manual
XPOS: converted from manual
Features: converted from manual
Relations: converted from manual
Contributors:  Kuzgun, Aslı; Cesur, Neslihan; Yıldız, Olcay Taner; Kuyrukçu, Oğuzhan; Marşan, Büşra; Arıcan, Bilge Nas; Kara, Neslihan; Aslan, Deniz Baran; Sanıyar, Ezgi; Asmazoğlu, Cengiz
Contributing: elsewhere
Contact: kuzgunasli@gmail.com / olcay.yildiz@ozyegin.edu.tr 
===============================================================================