VLEP Evalation
August 20, 2022 ยท View on GitHub
Task Definition
Given a video with associated dialogue as premise, and two possible future events, the VLEP task requires systems to predict which one is more likely to happen. The task performance is evaluated using accuracy.
How to construct a prediction file?
A prediction file is .jsonl file. Each line in this file contains a single json string that
can be loaded as a dict with two entries.
{"example_id": int, "pred_ans": int}
example_id is the id of the example, pred_ans is the index of the predicted answer, in {0, 1}.
Run Evaluation
At project root, run
bash standalone_eval/eval_sample.sh
This command will use eval.py to evaluate the provided sample_dev_submission.jsonl file,
the output will be written into sample_dev_submission_metrics_new.json.
Its content should be similar if not the same as sample_dev_submission_metrics.jsonl file.
Codalab Submission
To get your model's performance on test split,
please submit both dev and test predictions to our
CodaLab evaluation server.
The submission file should be a single .zip file (no enclosing folder)
that contains the two prediction files
vlep_test_submission.jsonl and vlep_dev_submission.jsonl, each of the *submission.jsonl file
should be formatted as instructed above.