Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
Published in Association for Computational Linguistics (ACL), 2026 Top 1% Overall Assessment
We propose ActionVLM, a vision-language framework for temporal action localization that uses Language Advantage to adaptively weight language, mitigating language shortcuts and grounding localization in visual evidence.
