PhD Candidate · University of Warwick

Jiaqi Li

Computer Vision & Multimodal Learning

I build models that perceive, localize, and act over long-horizon multimodal signals — spanning vision-language-action generation for Embodied AI and video temporal grounding for human actions.

View CV Google Scholar GitHub

Research Focus

Embodied AI & Vision-Language-Action

Generating physically grounded actions from language and vision for embodied agents.

Video Temporal Grounding

Localizing query-relevant moments and event boundaries in long, untrimmed videos.

Multimodal & Action Understanding

Vision-language models for temporal action localization, reasoning, and reliable evaluation.

Action Understanding & Generation
Embodied AI
Vision-Language-Action
Video Temporal Grounding
Temporal Action Localization
Vision-Language Models

About

I am a PhD candidate in Computer Science at the University of Warwick, supervised by Prof. Yu Guan. Previously, I completed an MSc in Computer Science at The University of Hong Kong and a BSc in Information and Computing Science, with a minor in Computer Science and Technology.

My recent work has been published at ECCV, ICML, ACL, and IMWUT, focusing on making multimodal models more efficient, faithful, and capable of long-horizon reasoning.

At a Glance

— Total Citations

— Total Papers

Published At ECCV ICML ACL IMWUT

Research Experience

Research AssistantUniversity of Warwick · 2024–25, 2026
Research AssistantHKUST · 2023–24

News

June 2026

ECCV 2026 paper accepted

Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding is accepted to ECCV 2026.

May 2026

ICML 2026 paper accepted

Our collaborative paper Doppler Prompting for Stable mmWave-based Human Pose Estimation is accepted to ICML 2026.

April 2026

ACL 2026 papers accepted

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization and a collaborative paper were accepted to ACL 2026.

Selected Publications

All publications →

ACL 2026

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization

Jiaqi Li, Guangming Wang, Shuntian Zheng, Minzhe Ni, Xiaoman Lu, Guanghui Ye, Yu Guan

We propose ActionVLM, a vision-language framework for temporal action localization that uses Language Advantage to adaptively weight language, mitigating language shortcuts and grounding localization in visual evidence.

Details Paper Code Oral, Top 1% Overall Assessment

ECCV 2026

Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding

Jiaqi Li, Shuntian Zheng, Yixian Shen, Jia-Hong Huang, Xiaoman Lu, Minzhe Ni, Yu Guan

SemVID is a training-free VTG token pruning framework that preserves both boundary-critical evidence and cross-frame reasoning.

Details Paper Code

ACL Findings 2025

Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning

Jiaqi Li, Yixuan Tang, Yi Yang

A novel fine-tuning framework to automatically synthesize training data tailored for rejecting the questions exceeds the knowledge without compromising on other tasks.

Details Paper Code

ICML 2026

Doppler Prompting for Stable mmWave-based Human Pose Estimation

Shuntian Zheng, Jiaqi Li, Xiaoman Lu, et al.

We improve mmWave human pose stability by treating Doppler as a confidence-gated motion prompt that selectively conditions spatial magnitude, reducing spurious motion artifacts and velocity error across single- and multi-person benchmarks.

Details

IMWUT 2026

Person Parametric Physics-informed Representation for mmWave-based Human Pose Estimation

Shuntian Zheng, Jiaqi Li, Guangming Wang, Minzhe Ni, Arnad Palit, Giovanni Montana, Yu Guan

This paper proposes a new input paradigm for mmWave-based human pose estimation, which models human as an Gaussian ensemble enriched with electromagnetic and kinematic parameters.

Details Paper

Education

University of Warwick PhD in Computer Science, 2024 – present

The University of Hong Kong MSc in Computer Science, 2022 – 2023

Zhejiang University of Technology BSc in Information and Computing Science, minor in Computer Science and Technology, 2018 – 2022

Academic Service

Workshop Program Chair CVPR International Workshop on Vision Intelligence for Real-world Challenges (AI4RWC), 2026

Conference Reviewer · 2026 IJCAI, ICML, ACL ARR, NeurIPS, ECCV

Conference Reviewer · 2025 IJCAI, ICCV, AAAI

Patents

A method for calibrating the longitude and latitude of aerial image pixels Patent No. CN115457124A

A Remote Ischemic Preconditioning Training System and Method Patent No. CN107512239A, co-inventor

A Passenger Seat Belt Status Monitoring System Patent No. CN108461158A