Zirui Song
I am a forth year Undergraduate at University of Technology Sydeny(UTS) majoring in Software Engineering. Also I am a visiting student supervised by Prof. Xiuying Chen in MBZUAI NLP department . I have become a member of UTS-NLP since Feb 2024 where I am fortuante to be advised by Prof. Ling Chen, and be mentored by Prof. Meng Fang
I am deeply appreciative of my mentor, Prof. Dayan Guan, who guided me into scientific research.
Previously, I had wonderful experience working with Miao Fang at NEU.
I am curently seeking for the Mphil/Ph.D position in 2025 fall.
Email  / 
CV  / 
Google Scholar  / 
Twitter  / 
Github  / 
LinkedIn  / 
Wechat
|
|
What's news
2024-09-20: One paper was accepted by EMNLP 2024.
2024-07-01: One paper was accepted by ECCV 2024.
2024-02-29: Prof. Ling Chen had accepted me as an undergraduate research assistant at Australia Artificial Intelligence Institute(AAII).
2023-07-01: I am honored to be selected as an international exchange student majoring in Softawre Engineering at UTS.
2023-05-18: Prof. Dayan Guan had accepted me as a remote undergraduate research assistant at ROSE Lab.
2022-04-11: Prof. Miao Fang had accepted me as an undergraduate research assistant at NEU-NLP Lab.
|
Research
My primary research interests lie in the area of Large Multimodal Models , Vision COT and Prompt Engineering.
|
|
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai*,
Zirui Song*
Dayan Guanā ,
Zhenhao Chen,
Xing Luo,
Chenyu Yi,
Alex Kot
*Equal contribution, ā Corresponding Author
ECCV 2024
Project Page
/
github
/
arXiv
We propose BenchLMM to investigate the cross-style capability of Large Multimodal Models (LMMs).
|
|
MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot
Zirui Song*,
Yaohang Li*,
Meng Fangā ,
Zhenhao Chen,
Zecheng Shi,
Yuan Huang,
Ling Chen
Ling Chen
*Equal contribution, ā Corresponding Author
Arxiv, 2024
arXiv
Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility.
To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework that utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems.
The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps.
|
|
MOHAMED BIN ZAYED University of ARTIFICIAL INTELLIGENCE, visiting student. in NLP department, 2024.09 - Present
I am fortunate to be supervised by Prof. Xiuying Chen in MBZUAI NLP department.
|
|
University of Technology Sydeny, B.S. in Software Engineering, 2021 - 2025
WAM: 88.5/100 - First Class Honours.
|
Website template mainly borrowed from Here
|
|