PhD Candidate · University of Warwick

Jiaqi Li

Computer Vision & Multimodal Learning

I build models that perceive, localize, and act over long-horizon multimodal signals — spanning vision-language-action generation for Embodied AI and video temporal grounding for human actions.

Portrait of Jiaqi Li

Research Focus

Embodied AI & Vision-Language-Action

Generating physically grounded actions from language and vision for embodied agents.

Video Temporal Grounding

Localizing query-relevant moments and event boundaries in long, untrimmed videos.

Multimodal & Action Understanding

Vision-language models for temporal action localization, reasoning, and reliable evaluation.

  • Action Understanding & Generation
  • Embodied AI
  • Vision-Language-Action
  • Video Temporal Grounding
  • Temporal Action Localization
  • Vision-Language Models

About

I am a PhD candidate in Computer Science at the University of Warwick, supervised by Prof. Yu Guan. Previously, I completed an MSc in Computer Science at The University of Hong Kong and a BSc in Information and Computing Science, with a minor in Computer Science and Technology.

My recent work has been published at ECCV, ICML, ACL, and IMWUT, focusing on making multimodal models more efficient, faithful, and capable of long-horizon reasoning.

At a Glance

Total Citations
Total Papers
Published At ECCV ICML ACL IMWUT
Research Experience
  • Research AssistantUniversity of Warwick · 2024–25, 2026
  • Research AssistantHKUST · 2023–24

News

June 2026
ECCV 2026 paper accepted

Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding is accepted to ECCV 2026.

May 2026
ICML 2026 paper accepted

Our collaborative paper Doppler Prompting for Stable mmWave-based Human Pose Estimation is accepted to ICML 2026.

April 2026
ACL 2026 papers accepted

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization and a collaborative paper were accepted to ACL 2026.

Selected Publications

All publications →
ICML 2026

Doppler Prompting for Stable mmWave-based Human Pose Estimation

Shuntian Zheng, Jiaqi Li, Xiaoman Lu, et al.

We improve mmWave human pose stability by treating Doppler as a confidence-gated motion prompt that selectively conditions spatial magnitude, reducing spurious motion artifacts and velocity error across single- and multi-person benchmarks.

Education

University of Warwick PhD in Computer Science, 2024 – present
The University of Hong Kong MSc in Computer Science, 2022 – 2023
Zhejiang University of Technology BSc in Information and Computing Science, minor in Computer Science and Technology, 2018 – 2022

Academic Service

Workshop Program Chair CVPR International Workshop on Vision Intelligence for Real-world Challenges (AI4RWC), 2026
Conference Reviewer · 2026 IJCAI, ICML, ACL ARR, NeurIPS, ECCV
Conference Reviewer · 2025 IJCAI, ICCV, AAAI

Patents