Med-HallMark Benchmark Description
June 24, 2024 ยท View on GitHub
Illustration of statistical information and construction content of Med-HallMark. We show separately (a) multi-task hallucination support, (b) multifaceted hallucination data, and (c) hierarchical hallucination categorization.
In the proposed Med-HallMark benchmark, the
Conventional.json, Confidence_weakening.json, Counterfactual.json, and Irg.json represent data instances of conventional, confidence weakening, counterfactual, and image depiction questions, respectively, which are used in the baseline models for evaluations.
We will gradually release all the data in the subsequent phases. Note that the images used in Med-HallMark are from RAD, Snake, MIMIC, and Openi datasets, so you need to comply with the licenses of the original datasets and download the images if you want to access them.