Zirui Song

I am a third year Undergraduate at University of Technology Sydeny(UTS) majoring in Software Engineering. I have become a member of UTS-NLP since Feb 2024 where I am fortuante to be advised by Prof. Ling Chen, and be mentored by Prof. Meng Fang I am deeply appreciative of my mentor, Prof. Dayan Guan, who guided me into scientific research. Previously, I had wonderful experience working with Miao Fang at NEU.

I am curently seeking for the Mphil/Ph.D position in 2025 fall.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  /  LinkedIn  /  Wechat

profile photo
What's news

2024-07-01: One paper was accepted by ECCV 2024.

2024-02-29: Prof. Ling Chen had accepted me as an undergraduate research assistant at Australia Artificial Intelligence Institute(AAII).
2023-07-01: I am honored to be selected as an international exchange student majoring in Softawre Engineering at UTS.
2023-05-18: Prof. Dayan Guan had accepted me as a remote undergraduate research assistant at ROSE Lab.
2022-04-11: Prof. Miao Fang had accepted me as an undergraduate research assistant at NEU-NLP Lab.
2022-02-15: I won a third-class scholarship from Northeastern University.
2021-09-07: I was admitted to the Computer Science and Technology major at Northeastern University(NEU)

Research

My primary research interests lie in the area of Large Multimodal Models , Vision COT and Prompt Engineering.

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai*, Zirui Song* Dayan Guanā€ , Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot
*Equal contribution, ā€ Corresponding Author

ECCV 2024


Project Page / github / arXiv

We propose BenchLMM to investigate the cross-style capability of Large Multimodal Models (LMMs).

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot
Zirui Song*, Yaohang Li*, Meng Fangā€ , Zhenhao Chen, Zecheng Shi, Yuan Huang, Ling Chen
*Equal contribution, ā€ Corresponding Author

Arxiv, 2024
arXiv

Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework that utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps.

Education

University of Technology Sydeny, B.S. in Software Engineering, 2023 - Present

WAM: 88.5/100 - First Class Honours.

Northeastern University, B.S. in Computer Science and Technology, 2021 - 2023

GPA: 3.47/4.00

After two years of academic pursuits, I was honored to be selected as an exchange student for the University of Technology Sydney, where I will complete the final two years of my undergraduate studies.

Hebei Hengshui High School, Top 1 Senior high school in China, 2018 - 2021

I am very grateful to have met my science and innovation mentor here.

Experiences

NJU-NLP Lab, Summer Camper

Rapid-Rich Object Search Lab (ROSE), Undergraduate research assistant


Website template mainly borrowed from Here