Hi 👋, I’m Zirui Song(/ˈziːˌruː.i/ /sɔːŋ/), a forth year Undergraduate at University of Technology Sydeny(UTS) majoring in Software Engineering. Also I am a visiting student supervised by Prof. Xiuying Chen in MBZUAI NLP department . I have become a member of UTS-NLP since Oct 2023 where I am fortuante to be advised by Prof. Ling Chen, and be mentored by Prof. Meng Fang. I am deeply appreciative of my mentor, Prof. Dayan Guan, who guided me into scientific research.
I am very excited to discuss potential collaborations, please feel free to contact me.
💻 News
2025-03-01: “I was admitted to the PhD program in the NLP department at MBZUAI, where I will commence my PhD studies in August 2025 here.”
2025-01-23: One paper was accepted by NAACL 2025.
2025-01-02: One paper was accepted by Communications Chemistry.
2024-09-25: First day as a visiting student at MBZUAI under the supervision of Prof. Xiuying Chen.
2024-09-20: One paper was accepted by EMNLP 2024.
2024-07-01: One paper was accepted by ECCV 2024.
2023-11-29: Prof. Ling Chen had accepted me as an undergraduate research assistant at Australia Artificial Intelligence Institute(AAII).
2023-07-01: I am honored to be selected as an international exchange student majoring in Softawre Engineering at UTS.
2023-05-18: Prof. Dayan Guan had accepted me as a remote undergraduate research assistant at ROSE Lab.
💡 Research Interest
- Multimodal AI: My current research goal is to integrate multimodal information to improve the performance of large language models, at the same time, I am also seek for applications of multimodal models in Geolocation and Embodied AI domains.
- Trustworthy AI: I am also highly interested and experienced in exploring the Jailbreak and attack issues of Multimodal Language Models, particularly in the Vision and Audio modalities.
📖 Educations
- 2021.06->2025.05: B.E.,
University of Technology Sydney(UTS) QS Ranking: 88, U.S. News Ranking: 85. GPA: 3.93/4.00
- 2025.08->2029.05 (Expected) :PhD,
Mohamed Bin Zayed University of Artificial Intelligence(MBZUAI)
📝 Publications

Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies
Zirui Song*, Guangxian Ouyang*, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, Yujie Fu, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen (*: first co-authors)
“AnomalyGen : A framework of anomaly vitrual scene data generation without human annotation, to enhance the robustness of robots.”

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai*, Zirui Song*, Dayan Guan, Zhenhao Chen, Yaohang Li, Xing Luo, Chenyu Yi & Alex Kot (*: first co-authors)
“BenchLMM: A novel, comprehensive benchmark, specifically designed to investigate the cross-style capability of Large Multimodal Models (LMMs).”

MedINST: Meta Dataset of Biomedical Instructions
Wenhan Han, Meng Fang, Zihan Zhang, Yu Yin, Zirui Song, Ling Chen, Mykola Pechenizkiy, Qingyu Chen
“MedINST, the Meta Dataset of Biomedical Instructions, a novel multi-domain, multi-task instructional meta-dataset. MedINST comprises 133 biomedical NLP tasks and over 7 million training samples.”

Unveiling the power of language models in chemical research question answering
Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Zirui Song, Xin Gao & Xiangliang Zhang
“ScholarChemQA, a large-scale QA dataset constructed from chemical papers.” Code

Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
Zirui Song*, Jingpu Yang*, Yuan Huang*, Jonathan Tonglet, Zeyu Zhang, Tao Cheng, Meng Fang, Iryna Gurevych, Xiuying Chen (*: first co-authors)
“We introduce a comprehensive geolocation framework with three key components: GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric, collectively designed to address critical challenges and drive advancements in geolocation research. At the core of this framework is GeoComp (Geolocation Competition Dataset), a large-scale dataset collected from a geolocation game platform involving 740K users over two years. It comprises 25 million entries of metadata and 3 million geo-tagged locations spanning much of the globe”

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
Zirui Song*, Bin Yan*, Yuhan Liu, Miao Fang, Mingzhe Li, Rui Yan, Xiuying Chen (*: first co-authors)
“LLMs general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge. In this survey, we provide a comprehensive overview of these methods, which we categorize into four key approaches: dynamic knowledge injection, static knowledge embedding, modular adapters, and prompt optimization.”

From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution
Yuhan Liu,Zirui Song, Xiaoqing Zhang, Xiuying Chen, Rui Yan
“We take the first step toward simulating and revealing this evolution, proposing a Fake News evolUtion Simulation framEwork (FUSE) based on LLMs”

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Rui Zhao*, Zeyu Zhang*, Yi Xu, Yi Yao, Yan Huang, Wenxin Zhang, Zirui Song, Xiuying Chen, Yang Zhao (*: first co-authors)
“PedDet, an adaptive spectral optimization complementarity framework specifically enhanced and optimized for multispectral pedestrian detection”

Foundations and recent trends in multimodal mobile agents: A survey
Biao Wu*, Yanda Li*, Meng Fang, Zirui Song, Zhiwei Zhang, Yunchao Wei, Ling Chen (*: first co-authors)
“Mobile agents are essential for automating tasks in complex and dynamic mobile environments. As foundation models evolve, the demands for agents that can adapt in real-time and process multimodal data have grown. This survey provides a comprehensive review of mobile agent technologies, focusing on recent advancements that enhance real-time adaptability and multimodal interaction.”
Note: The first author Biao Wu canceled my co-authorship during the second submission to ARR2025. And I have not received any explanation from him. But I think I still have credit for this work, and disagree with his decision.
💼 Experiences
- [2024.09 - now]
MBZUAI (Supervisor: Prof.Xiuying Chen,topic:Trustworthy MLLMs)
- [2023.10 - 2025.02]
University of Technology Sydney, Research Intern (Supervisor: Prof.Ling Chen and Prof.Meng Fang,topic: Multimodal Agents)
- [2023.03 - 2024.01]
Nanyang Technological University, Research Intern (Supervisor: Prof.Dayan Guan,topic: Multimodal LLMs)
🏆 Honors and Awards
- 🥈 Silver Medal, Kaggle - LLM Science Exam [51/2664], 2024
- 🥇 School Second Class Scholarship,2022
📚 Resources
Blogs
- [05/24] [Chinese] National Undergraduate Innovation Project Documentation. [Link]
- [03/24] [Chinese] Negative Transfer. [Link]
- [03/24] [Chinese] Mixture of Experts Explained. [Link]
- [01/24] [Chinese] EMNLP2020 Tutorial Notes (Topic: Explainable AI). [Link]
📜 References
You can find my full CV here (Latest update: Oct 14th, 2024).