Zirui Song

I am a forth year Undergraduate at University of Technology Sydeny(UTS) majoring in Software Engineering. Also I am a visiting student supervised by Prof. Xiuying Chen in MBZUAI NLP department . I have become a member of UTS-NLP since Feb 2024 where I am fortuante to be advised by Prof. Ling Chen, and be mentored by Prof. Meng Fang I am deeply appreciative of my mentor, Prof. Dayan Guan, who guided me into scientific research. Previously, I had wonderful experience working with Miao Fang at NEU.

I am curently seeking for the Mphil/Ph.D position in 2025 fall.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  /  LinkedIn  /  Wechat

profile photo
What's news

2024-09-20: One paper was accepted by EMNLP 2024.

2024-07-01: One paper was accepted by ECCV 2024.

2024-02-29: Prof. Ling Chen had accepted me as an undergraduate research assistant at Australia Artificial Intelligence Institute(AAII).
2023-07-01: I am honored to be selected as an international exchange student majoring in Softawre Engineering at UTS.
2023-05-18: Prof. Dayan Guan had accepted me as a remote undergraduate research assistant at ROSE Lab.
2022-04-11: Prof. Miao Fang had accepted me as an undergraduate research assistant at NEU-NLP Lab.

Research

My primary research interests lie in the area of Large Multimodal Models , Vision COT and Prompt Engineering.

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai*, Zirui Song* Dayan Guanā€ , Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot
*Equal contribution, ā€ Corresponding Author

ECCV 2024


Project Page / github / arXiv

We propose BenchLMM to investigate the cross-style capability of Large Multimodal Models (LMMs).

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot
Zirui Song*, Yaohang Li*, Meng Fangā€ , Zhenhao Chen, Zecheng Shi, Yuan Huang, Ling Chen Ling Chen
*Equal contribution, ā€ Corresponding Author

Arxiv, 2024
arXiv

Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework that utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps.

Education

MOHAMED BIN ZAYED University of ARTIFICIAL INTELLIGENCE, visiting student. in NLP department, 2024.09 - Present

I am fortunate to be supervised by Prof. Xiuying Chen in MBZUAI NLP department.

University of Technology Sydeny, B.S. in Software Engineering, 2021 - 2025

WAM: 88.5/100 - First Class Honours.
Experiences

NJU-NLP Lab, Summer Camper

Rapid-Rich Object Search Lab (ROSE), Undergraduate research assistant


Website template mainly borrowed from Here