Changes to Evaluation Protocols and Fixes

January 24, 2026 ยท View on GitHub

  • Minor fix where the examples with ground-truth mask all background are discarded in the "When" analysis, Sep 2025.
  • LLaVA-G evaluation on PixMMVP was fixed, Jan 2026.
  • Added the SpaCy similarity filtration for noun phrases below 0.7 following their implementation on PixMMVP, Jan 2026.
  • Pixfoundation results were updated to use GPT-5.1 and performing automatic filtration w GPT 5.1 in PixMMVP for images that do not include the queried expression.
  • PixFoundation (oracle) results were updated to include filtration of the images that do not include the queried expression using the groundtruth mask.