Hi 👋, I’m Zirui Song(/ˈziːˌruː.i/ /sɔːŋ/), a forth year Undergraduate at University of Technology Sydeny(UTS) majoring in Software Engineering. Also I am a visiting student supervised by Prof. Xiuying Chen in MBZUAI NLP department . I have become a member of UTS-NLP since Oct 2023 where I am fortuante to be advised by Prof. Ling Chen, and be mentored by Prof. Meng Fang. I am deeply appreciative of my mentor, Prof. Dayan Guan, who guided me into scientific research.

I am very excited to discuss potential collaborations, please feel free to contact me.

💻 News

2025-03-01: “I was admitted to the PhD program in the NLP department at MBZUAI, where I will commence my PhD studies in August 2025 here.”

2025-01-23: One paper was accepted by NAACL 2025.

2025-01-02: One paper was accepted by Communications Chemistry.

2024-09-25: First day as a visiting student at MBZUAI under the supervision of Prof. Xiuying Chen.

2024-09-20: One paper was accepted by EMNLP 2024.

2024-07-01: One paper was accepted by ECCV 2024.

2023-11-29: Prof. Ling Chen had accepted me as an undergraduate research assistant at Australia Artificial Intelligence Institute(AAII).
2023-07-01: I am honored to be selected as an international exchange student majoring in Softawre Engineering at UTS.
2023-05-18: Prof. Dayan Guan had accepted me as a remote undergraduate research assistant at ROSE Lab.

💡 Research Interest

  • Multimodal AI: My current research goal is to integrate multimodal information to improve the performance of large language models, at the same time, I am also seek for applications of multimodal models in Geolocation and Embodied AI domains.
  • Trustworthy AI: I am also highly interested and experienced in exploring the Jailbreak and attack issues of Multimodal Language Models, particularly in the Vision and Audio modalities.

📖 Educations

📝 Publications

sym

Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies Static Badge

Zirui Song*, Guangxian Ouyang*, Meng Fang, Hongbin Na, Zijing Shi, Zhenhao Chen, Yujie Fu, Zeyu Zhang, Shiyu Jiang, Miao Fang, Ling Chen, Xiuying Chen (*: first co-authors)

“AnomalyGen : A framework of anomaly vitrual scene data generation without human annotation, to enhance the robustness of robots.”

sym

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal ModelsStatic Badge

Rizhao Cai*, Zirui Song*, Dayan Guan, Zhenhao Chen, Yaohang Li, Xing Luo, Chenyu Yi & Alex Kot (*: first co-authors)

“BenchLMM: A novel, comprehensive benchmark, specifically designed to investigate the cross-style capability of Large Multimodal Models (LMMs).”

Toolkit & Code

sym

MedINST: Meta Dataset of Biomedical Instructions Static Badge

Wenhan Han, Meng Fang, Zihan Zhang, Yu Yin, Zirui Song, Ling Chen, Mykola Pechenizkiy, Qingyu Chen

“MedINST, the Meta Dataset of Biomedical Instructions, a novel multi-domain, multi-task instructional meta-dataset. MedINST comprises 133 biomedical NLP tasks and over 7 million training samples.”

sym

Unveiling the power of language models in chemical research question answering Static Badge

Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Zirui Song, Xin Gao & Xiangliang Zhang

“ScholarChemQA, a large-scale QA dataset constructed from chemical papers.” Code

sym

Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework Static Badge

Zirui Song*, Jingpu Yang*, Yuan Huang*, Jonathan Tonglet, Zeyu Zhang, Tao Cheng, Meng Fang, Iryna Gurevych, Xiuying Chen (*: first co-authors)

“We introduce a comprehensive geolocation framework with three key components: GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric, collectively designed to address critical challenges and drive advancements in geolocation research. At the core of this framework is GeoComp (Geolocation Competition Dataset), a large-scale dataset collected from a geolocation game platform involving 740K users over two years. It comprises 25 million entries of metadata and 3 million geo-tagged locations spanning much of the globe”

sym

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey Static Badge

Zirui Song*, Bin Yan*, Yuhan Liu, Miao Fang, Mingzhe Li, Rui Yan, Xiuying Chen (*: first co-authors)

“LLMs general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis. To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge. In this survey, we provide a comprehensive overview of these methods, which we categorize into four key approaches: dynamic knowledge injection, static knowledge embedding, modular adapters, and prompt optimization.”

sym

From a Tiny Slip to a Giant Leap: An LLM-Based Simulation for Fake News Evolution Static Badge

Yuhan Liu,Zirui Song, Xiaoqing Zhang, Xiuying Chen, Rui Yan

“We take the first step toward simulating and revealing this evolution, proposing a Fake News evolUtion Simulation framEwork (FUSE) based on LLMs”

sym

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection Static Badge

Rui Zhao*, Zeyu Zhang*, Yi Xu, Yi Yao, Yan Huang, Wenxin Zhang, Zirui Song, Xiuying Chen, Yang Zhao (*: first co-authors)

“PedDet, an adaptive spectral optimization complementarity framework specifically enhanced and optimized for multispectral pedestrian detection”

sym

Foundations and recent trends in multimodal mobile agents: A survey Static Badge

Biao Wu*, Yanda Li*, Meng Fang, Zirui Song, Zhiwei Zhang, Yunchao Wei, Ling Chen (*: first co-authors)

“Mobile agents are essential for automating tasks in complex and dynamic mobile environments. As foundation models evolve, the demands for agents that can adapt in real-time and process multimodal data have grown. This survey provides a comprehensive review of mobile agent technologies, focusing on recent advancements that enhance real-time adaptability and multimodal interaction.”

Note: The first author Biao Wu canceled my co-authorship during the second submission to ARR2025. And I have not received any explanation from him. But I think I still have credit for this work, and disagree with his decision.

💼 Experiences

  • [2024.09 - now] MBZUAI (Supervisor: Prof.Xiuying Chen,topic:Trustworthy MLLMs)
  • [2023.10 - 2025.02] University of Technology Sydney, Research Intern (Supervisor: Prof.Ling Chen and Prof.Meng Fang,topic: Multimodal Agents)
  • [2023.03 - 2024.01] Nanyang Technological University, Research Intern (Supervisor: Prof.Dayan Guan,topic: Multimodal LLMs)

🏆 Honors and Awards

  • 🥈 Silver Medal, Kaggle - LLM Science Exam [51/2664], 2024
  • 🥇 School Second Class Scholarship,2022

📚 Resources

Blogs

  • [05/24] [Chinese] National Undergraduate Innovation Project Documentation. [Link]
  • [03/24] [Chinese] Negative Transfer. [Link]
  • [03/24] [Chinese] Mixture of Experts Explained. [Link]
  • [01/24] [Chinese] EMNLP2020 Tutorial Notes (Topic: Explainable AI). [Link]

📜 References

You can find my full CV here (Latest update: Oct 14th, 2024).

414 Total Pageviews