About Me
Hi! I am a third-year Ph.D. student in MMLab at The Chinese University of Hong Kong, advised by Prof. Dahua Lin. Prior to this, I earned my Bachelor’s degree in Computer Science and Technology (2022) from the Chu Kochen Honors College (Pursuit Science Class), Zhejiang University, where I was supervised by Prof. Guofeng Zhang in the State Key Laboratory of CAD&CG. I have interned at several leading research institutions, including Meta FAIR, Shanghai AI Lab, and SenseTime Research. Some of the awards I have received include the ECCV 2024 Best Paper Finalist, the Hong Kong PhD Fellowship, the CUHK Vice-Chancellor’s PhD Scholarship, and the Undergraduate National Scholarship.
My long-term goal is to develop an intellectual model capable of universally perceiving and reasoning about our 3D physical world—primarily through visual information and various multi-modal data—that can be deployed on robots or wearable devices (like AI/AR glasses). This pursuit aims to augment human intelligence and to benefit society at large. Achieving this goal involves overcoming several core challenges:
- Output: defining what perceptions are essential for the intelligent agent.
- Representation: establishing a viable world (scene) representation.
- Data: addressing the difficulty in acquiring necessary training data.
I am generally interested in research that seeks solutions to these problems. Currently, my exploration focuses on equipping LLMs and VLMs with spatial intelligence. Specifically, I am trying to scale spatial intelligence. If you share my interests, have articles to recommend that are helpful, or have any queries, please do not hesitate to contact me. :)
I am on the job market.
News
- [Jul. 2025] We release OST-Bench, a benchmark for evaluating the capabilities of MLLMs in online spatio-temporal scene understanding! 📊
- [Jun. 2025] We release RoboMaster, a video generation method for robotic manipulation with trajectory control! 🤖
- [Jun. 2025] We release MMSI-Bench, a novel, comprehensive, fully-human-annotated, and reasoning-based benchmark for multi-image spatial intelligence! 📊
- [May. 2025] We release Multi-SpatialMLLM, a VLM capable of multi-frame spatial understanding like predicting object and camera movement vectors! 🤩
- [Dec. 2024] I completed my internship at Meta — it was an enjoyable and wonderful experience!
- [Oct. 2024] PointLLM was accepted to ECCV 2024 with all “strong accept” reviews and selected as a Best Paper Candidate! 🎉
- [Sep. 2024] Our paper VLM-Grounder, a VLM agent for zero-shot 3D visual grounding, has been accepted by CoRL 2024! 🥳
- [Sep. 2024] Our MMScan, which provides hierarchical grounded language annotations for multi-modal 3D scene understanding, and Chat-Scene, a 3D-LLM with superior scene understanding performance, were accepted by NeurIPS 2024! 🎉
- [Jun. 2024] I started my internship as a Research Scientist Intern at FAIR Perception, Meta in Menlo Park, CA. 🚀
- [May. 2024] We release Grounded 3D-LLM, a 3D-LLM capable of object grounding in 3D scenes! 🤩
Past News
- [Feb. 2024] Our EmbodiedScan was accepted to CVPR 2024! It is a multi-modal 3D dataset with high-quality human annotations for embodied 3D scene perception! 🤩
- [Aug. 2023] We release PointLLM, a multi-modal large language model capable of understanding point clouds! Try our demo here. 🤗
- [Mar. 2023] Our paper MV-JAR, a LiDAR-based self-supervised pre-training method along with a new data-efficient benchmark, has been accepted by CVPR 2023. 🎉
- [Jun. 2022] Graduated from Zhejiang University. Forever cherishing the memories from ZJU. 🎓
- [Apr. 2022] Honored to receive the Hong Kong PhD Fellowship (HKPFS) and the CUHK Vice-Chancellor’s PhD Scholarship. Deeply grateful for the recognition! 🏆
Education
- The Chinese University of Hong Kong (CUHK)
- August 2022 - June 2026 (Expected)
- Ph.D. in Information Engineering
- Zhejiang University (ZJU)
- September 2018 - June 2022
- B.Eng. in Computer Science and Technology
Selected Publications
* denotes equal contribution, # denotes corresponding author, † denotes project lead.
- Spatial Intelligence for LLMs/VLMs
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-Temporal Scene Understanding
- Jingli Lin*, Chenming Zhu*, Runsen Xu, Xiaohan Mao, Xihui Liu, Tai Wang#, Jiangmiao Pang#
- arXiv Preprint, 2025
- [Paper] [Code] [Project] [Dataset]
- MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
- Sihan Yang*, Runsen Xu*†, Yiman Xie, Sizhe Yang, Mo Li, Jingli Lin, Chenming Zhu, Xiaochen Chen, Haodong Duan, Xiangyu Yue, Dahua Lin, Tai Wang#, Jiangmiao Pang#
- arXiv Preprint, 2025
- [Paper] [Code] [Project] [Dataset] [中文解读]
- Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
- Runsen Xu, Weiyao Wang, Hao Tang, Xingyu Chen, Xiaodong Wang, Fu-Jen Chu, Dahua Lin, Matt Feiszli, Kevin J. Liang
- arXiv Preprint, 2025
- [Paper] [Code] [Project]
- Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
- Xiao Fu, Xintao Wang, Xian Liu, Jianhong Bai, Runsen Xu, Pengfei Wan, Di Zhang, Dahua Lin
- arXiv Preprint, 2025
- [Paper] [Code] [Project]
- VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
- Runsen Xu, Zhiwei Huang, Tai Wang, Yilun Chen, Jiangmiao Pang#, Dahua Lin
- Conference on Robot Learning, CoRL 2024
- [Paper] [Code] [Project]
- Grounded 3D-LLM with Referent Tokens
- Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Runsen Xu, Ruiyuan Lyu, Dahua Lin, Jiangmiao Pang
- arXiv Preprint, 2024
- [Paper] [Code] [Project]
- MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
- Ruiyuan Lyu*, Tai Wang*, Jingli Lin*, Shuai Yang*, Xiaohan Mao, Yilun Chen, Runsen Xu, et al.
- Neural Information Processing Systems, Datasets and Benchmarks Track, NeurIPS 2024
- [Paper] [Code] [Project]
- Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
- Haifeng Huang, Yilun Chen, Zehan Wang, Rongjie Huang, Runsen Xu, Tai Wang, et al.
- Neural Information Processing Systems, NeurIPS 2024
- [Paper] [Code]
- PointLLM: Empowering Large Language Models to Understand Point Clouds
- Runsen Xu, Xiaolong Wang, Tai Wang#, Yilun Chen, Jiangmiao Pang#, Dahua Lin
- European Conference on Computer Vision, ECCV 2024, Best Paper Candidate
- [Paper] [Code] [Project] [Demo] [Bilibili]
- EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
- Tai Wang*, Xiaohan Mao*, Chenming Zhu*, Runsen Xu, et al.
- Computer Vision and Pattern Recognition, CVPR 2024
- [Paper] [Code] [Project] [中文解读]
- Self-Supervised 3D Representation Learning
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
- Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang#, Dahua Lin
- Computer Vision and Pattern Recognition, CVPR 2023
- [Paper] [Code] [Video] [Slides]
- COˆ3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving
- Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, Ping Luo
- International Conference on Learning Representations, ICLR 2023
- [Paper] [Code]
- Robot Localization and Navigation
- Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
- Xiaolong Wang, Runsen Xu, Zuofan Cui, Zeyu Wan, Yu Zhang
- Neural Information Processing Systems, NeurIPS 2023
- [Paper] [Code] [Demo]
- RNIN-VIO: Robust Neural Inertial Navigation Aided Visual-Inertial Odometry in Challenging Scenes
- Danpeng Chen, Nan Wang, Runsen Xu, Weijian Xie, Hujun Bao, Guofeng Zhang
- International Symposium on Mixed and Augmented Reality, ISMAR 2021, Oral Presentation
- [Paper] [Code] [Project]
Selected Awards
- ECCV Best Paper Candidate, 2024
- Hong Kong PhD Fellowship (the most prestigious scholarship for Ph.D. studies in Hong Kong), 2022
- CUHK Vice-Chancellor’s PhD Scholarship, 2022
- Outstanding Graduates of Zhejiang University, 2022
- Outstanding Undergraduate Thesis of College of Computer Science and Technology, Zhejiang University, 2022
- National Scholarship (highest honor nationwide for Chinese undergraduates), 2019
Academic Services
- Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICML, ACM MM, ICRA
Teaching
- IERG4998: Final Year Project, Spring 2023
- IERG4998: Final Year Project, Fall 2022