PhD Candidate - University of Warwick
Jiaqi Li
I work on long video understanding, multimodal models, video temporal grounding, embodied AI, robot task and motion planning, and human pose estimation.
About
I am a PhD candidate in Computer Science at the University of Warwick, supervised by Prof. Guan Yu. Previously, I completed an MSc in Computer Science at The University of Hong Kong and a BSc in Information and Computing Science, with a minor in Computer Science and Technology.
My recent work spans multimodal video understanding, temporal grounding, and trustworthy instruction tuning.
Research Interests
- Long video understanding
- Multimodal models
- Video temporal grounding
- Temporal Action Localization
- Embodied AI
- Robot task and motion planning
- Human pose estimation
Highlights
View full CVNews
Our collaborative paper Why Learn What Physics Already Knows? Realizing Agile mmWave-based Human Pose Estimation via Physics-Guided Preprocessing was accepted to ICME 2026.
Invited talk on Video Forensics and Video Compression for CS355 Digital Forensics at the University of Warwick.
Serving as Workshop Program Chair for CVPR AI4RWC 2026 and reviewer for ICML, ACL ARR, and IJCAI.
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning was accepted to ACL Findings 2025.
Selected Publications
All publicationsTowards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization
We propose ActionVLM, a vision-language framework for temporal action localization that uses Language Advantage to adaptively weight language, mitigating language shortcuts and grounding localization in visual evidence.
Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding
SemVID is a training-free VTG token pruning framework that preserves both boundary-critical evidence and cross-frame reasoning.
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
A novel fine-tuning framework to automatically synthesize training data tailored for rejecting the questions exceeds the knowledge without compromising on other tasks.
Person Parametric Physics-informed Representation for mmWave-based Human Pose Estimation
This paper proposes a new input paradigm for mmWave-based human pose estimation, which models human as an Gaussian ensemble enriched with electromagnetic and kinematic parameters.
