Using human annotators

October 29, 2024 · View on GitHub

I suggest reading Section 3 of this review of good practices in data annotation quality. If you want production level quality and have the means to implement all of these methods, go ahead!

Best_annotation_practices

However, important guidelines (no matter your project size) are the following, once you defined your task and scoring guidelines.

  • Workforce selection, and if you can monetary incentive You likely want the people working on your task to:
  1. obey some demographics. Some examples: be native speakers of the target language, have a higher education level, be experts in a specific domain, be diverse in their geographical origins, etc. Your needs will vary depending on your task.
  2. produce high quality work. It's notably important now to add a way to check if answers are LLM-generated, and you'll need to filter some annotators out of your pool. Imo, unless you're counting on highly motivated crowdsourced annotators, it's always better to pay your annotators correctly.
  • Guideline design Make sure to spend a lot of time really brainstorming your guidelines! That's one of the points on which we spent the most time for the GAIA dataset.

  • Iterative annotation Be ready to try several rounds of annotations, as your annotators will misunderstand your guidelines (they are more ambiguous than you think)! Generating samples several times will allow your annotators to really converge on what you need.

    • Quality estimation and Manual curation You want to control answers (notably via inter-annotator agreement if you can get it) and do a final selection to keep only the highest quality/most relevant answers.

Specialized tools to build annotated high quality datasets like Argilla can also help you.

Going further