I am a third-year Ph.D. student at Harbin Institute of Technology (Shenzhen) and Great Bay
University, advised by Liqiang Nie (IARP
Fellow), and Michael Yu Wang
(IEEE/ASME/HKIE Fellow), and co-supervised by Xiang Deng.
Currently I am a visiting student at the National University of Singapore, working with Mike Shou.
Before, I received M.S and B.E from Soochow University.
My research focuses on embodied AI and natural language processing, particularly on integrating
multimodal large language models with robotic systems. I am interested in enabling robots to better
perceive, reason, and act in the physical world through language-vision-action alignment.
InternVLA-A1 unifies scene understanding, visual foresight, and action execution into a single framework.
Moreover, it is empowered by high-fidelity synthetic data (InternData-A1).
A novel paradigm integrating visual foresight generation into the decision-making pipeline, enabling
robots to plan and execute complex tasks in dynamic environments.
We propose Cortical Policy, a cortex-inspired dual-stream view transformer for robotic manipulation that jointly reasons over static views (for 3D spatial understanding via pretrained 3D keypoint alignment) and dynamic views (for adaptive behavior via egocentric gaze–aware pretraining).
A framework for robust skill learning and composition that mitigates codebook collapse and models causal dependencies between skills, by introducing rotation-augmented skill quantization and a causal skill transformer.
We formulates 3D affordance detection as a language-driven reasoning task for open-world scenes.
By integrating LLM reasoning and multi-stage training, it achieves improved open-vocabulary affordance segmentation.
We introduces a novel multi-grained Mamba architecture that jointly models historical hidden states
and intra-step RTG–state–action relations, enabling more robust offline RL under out-of-distribution
settings.
A multimodal perception–planning framework that grounds MLLM reasoning in embodied robotic manipulation.
It improves generalization by combining goal-conditioned perception with retrieval-augmented planning.
Education & Experience
•
Harbin Institute of Technology (Shenzhen)2023.09 - Present
Ph.D. in Electronic Information, School of Computer Science and Technology
Advised by Liqiang Nie and Xiang Deng
•
Great Bay University2023.09 - Present
Ph.D. in Electronic Information, School of Engineering
Advised by Michael Yu Wang
•
National University of Singapore2025.10 - Present
Visiting student, College of Design and Engineering
Advised by Mike Shou
•
Soochow University2020.09 - 2023.06
M.S. in Computer Science and Technology, School of Computer Science and Technology
Advised by Guohong Fu
•
Soochow University2014.09 - 2018.06
B.S. in Software Engineering, School of Computer Science and Technology
Award
•National Scholarship, Ministry of Education of China
2025.10
•Excellent Student Award, Harbin Institute of Technology
2024.12
•Runner-up Award, EgoPlan, ICML Challenge
2024.07
•Bronze medal, Google AI4Code, Kaggle Competition
2022.11