Biography

I am a research scientist at Tencent Game AI. My research interests are in reinforcement learning, machine learning, and LLM/VLM based AI Agents. I have published multiple papers at top-tier AI/ML venues such as ICML, ICLR, CVPR, IJCAI, etc.

My recent research focuses on training generalist AI agents in complex environments through reinforcement and imitation learning, leveraging the rich prior knowledge from pretrained foundation models such as LLMs, VLMs, and video generation models. A highlight of my recent work has been featured by MIT Technology Review China.

I earned my Ph.D. in Computer Science from the University of Technology Sydney (UTS), where I was an active member of the research group led by Prof. Chengqi Zhang. During my academic journey, I have closely collaborated with Prof. Tianyi Zhou from the University of Maryland, College Park. I gained valuable industry experience as a research intern at JD Explore Academy, under the mentorship of Prof. Li Shen from Sun Yat-sen University, and within the team led by Prof. Xiaodong He. I also contributed to the development of advanced text/audio-to-video generation models as a research intern at Tencent Hunyuan, working under the guidance of Dr. Wei Liu. You are welcome to see more details in my CV: English and 中文.

I am open to collaboration and discussion. Please reach out if you are interested.

News

[2025-06] One paper (VLM Agents Trained with RL) has been accepted by ICCV 2025. Congrats to Tong Wei for his first top-tier conference paper!
[2025-05] One paper (findings) has been accepted by ACL 2025. [PDF], [CODE]
[2025-03] We identified a critical challenge in RL-based VLM agent training and proposed an innovative solution Guided Thought Reinforcement (GTR) that combines the strengths of RL and IL.
[2025-02] I have been awarded the 2024 DEAN’S LIST by UTS Faculty of Engineering & Information Technology for my excellent PhD thesis and research achievements.
[2025-01] We released a new benchmark MM-IQ for assessing the core reasoning capabilities of large multimodal models.
[2024-11] One paper (A Python Library for Black-box Optimization) has been accepted by JMLR MLOOS.

Selected papers

Other papers can be found on my Google Scholar page.

Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, and Yuhui Shi. “Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld”, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [PDF], [CODE]
Yijun Yang, Tianyi Zhou, Jing Jiang, Guodong Long, and Yuhui Shi. “Continual Task Allocation in Meta-Policy Network via Sparse Prompting”, International Conference on Machine Learning (ICML), 2023. [PDF], [CODE]
Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi, “Pareto Policy Pool for Model-based Offline Reinforcement Learning”, International Conference on Learning Representations (ICLR), 2022. [PDF], [CODE]

Yijun Yang

News

Selected papers